XQuery limiting the result set ..

Hello,

I have quite a big collection of cases stored in Tamino, (250.000) and this
number will grow to 1.000.000 entries. I am trying to implement some businness logic
from my app. server, and i do not know how to solve this with a XQuery.
(I am quite a new to XQuery…)
My schema looks like:

Collection Cases




































1. Is it possible to limit the result set returned by Tamino
( something like return first 5 records for the condition).
My experience is that count() is extremly slow, so i cannot use it on by data,
where the search for the absolute value is very fast, if the result set is not
to big.

2. is it possible to build “structured index” to get better performance
(in my case this would be court+caseYear+caseNumber)

All the tags i am using as searching criteria are indexed.

Maybe i would have to solve this using Tamino - API functins, but i prefer
using XQuery, any comment?

Thanks, Pavel

dear Pavel,
regarding the size of the result returned please have a look into the cursoring concept of tamino. the cursoring concept is API independent.

building structured or actually composed index information this is not available yet. to overcome this issue you have to create such type of information using the application and put this into the xml document stored later on in tamino

regards

Michael

Hi Pavel,

ad 1.
You should use the position() function which performs quite good from Tamino 4.1.4.n
Below is a real world example for an “MSOutlook contacts scenario” which looks like yours.

for $r in
(
(for $a in input()/contact sort by (zip)
where $a/street eq “blo blo”
and $a/zip gt 2134
return $a
)[position() > 1100 and position() < 1150]
)
return
{($r/name, $r/street)}


ad 2
Yes and no.
In the current version you can define “structured index” on the existing structure
- meaning you can create an index on an upper level of an XML structure.
As far as I remember, a more free definition of individual fiels in composite-indexes is on the roadmap for future versions.
regards
Finn

Hello again
Ok, this works, but performance is still terrible for my collection. I think the problem is, that in all the cases the Tamino builds the full set of documents that match the condition
and after that it takes out first, last, n.th or whatever element - or in case of cursors, the full result set is stored to some area and is accessible by handler; but this fills up the memory space with all the data, from which i need only one element. Ok, i will reduce this by limiting the result set to only inoid, and fetch the rest after that. It seems, there is a need for much more clever optimiser or some possibility to give some hint to the optimiser, as it is possible in some RDBMS.

Any comments? tnx, Pavel

Hi Pavel,
This doesn’t quite fit with my experience !
Could you try to post your ill-performing xquery ?
Finn

Hello, what i actualy do is a functionality previous / next, where criteria is composed out of 3 different fields (year, court_code, case_number). With composed index this would be easy, just prev-next by index, but there is no composed index functionalit as far as i know.So my solution in XQ is:
declare namespace xf="W3C XQuery 1.0 and XPath 2.0 Functions and Operators"
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction"
for $t in ( for $q in input()/Vpisnik
let $vred := $q/Glava/StevilkaZadeve
where ($vred < $st_zadeve) and ($q/Glava/LetoZadeve = $leto) and ($q/Glava/Sodisce/SodiSif=$sodi_sif)
return $q)[last()]
return

<inoid id={tf:getInoId( $t)}>
{$t}



What i do, i take the last value of all the smaller than current; this performs ok if the case number is upto 5000 or so, but if perform this for the case number ($st_zadeve) 20.000 or so he query needs more than a minute my timeout value) on the server P 2.8G, 2GB ram Linux Suse.
All the fields in the criteria are indexed, and i cannot have more decidive filter.

the implementation of NEXT:
declare namespace xf="W3C XQuery 1.0 and XPath 2.0 Functions and Operators"
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction"
for $t in ( for $q in input()/Vpisnik
let $vred := $q/Glava/StevilkaZadeve
where ($vred < $st_zadeve) and ($q/Glava/LetoZadeve = $leto) and ($q/Glava/Sodisce/SodiSif=$sodi_sif)
return $q)[last()]
return

<inoid id={tf:getInoId( $t)}>
{$t}




vocabulary:
$leto=year
$st_zadeve = case number
$sodi_sif = court code

Well what i did now is a computation of previous/next values and performance is great, but the problem is that the case number is not always linear (it could be 1,2,3,5,7,8,10…)

any suggestion?

Pavel Reberc

Hi Pavel,
I’ not deep into the internals of Tamino, but my guess is that Tamino will build up three results lists of ino:id’s for each of the three keys involved. At least the year and courtNo list will probably be VERY big.
Until the real composite index comes (I think in next version Q2) I would suggest you try to use the existing restricted composite index function:
Add an extra level in your document

2003
xyz
123

and then put an index on the new level “year-court-case”

Hi Pavel,

cursors are the right way to tackle the problem. However, you must use a “vague” cursor rather than an “insensitive” one. Unfortunately, the Java API currently does not allow you to set a cursor to “vague”.
There will be an update of teh Java API soon that allows you to do this switch.

In the meantime, you could use X-Query combined with cursors (there, the “insensitive” problem does not exist) to retrieve the ino ids or all the data. This should be faster than your current solution

Regards

Harald

Hello,
Thanks for answers. I will try both solutions, but changing the schema is quite hard once the data and apliccation businness logic is already implemented.
Well i do not understand why the use of cursors would make any difference in query performance; as far as i understand, the processing of an query has nothing to do with cursor; cursor is only a mechanism how to fetch the values from the result set or am i wrong?

Pavel Reberc

Yes and no.

If you fetch the whole result set, the costs with and without cursors are almost the same.
But if you look at a part of the result only, cursors save several (possibly expensive) postprocessing steps (for the parts of the result that you did not fetch).
If I got you right, you are currently doing the same query mulitple times to do previous/next logic. This is more expensive than a cursor. However, be aware that the “insensitive” variant
of the XQuery cursor computes the entire result before it gives back any part of the result.
This is the reason why you should not use this variation of the cursor

Regards

Harald