xquery performance

Hi all, we have a collection with 18766 very small and simple documents: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:tsd=“http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition” xmlns:xs=“XML Schema”> xs:annotation xs:appinfo <tsd:schemaInfo name=“Index”> <tsd:collection name=“Index”/> <tsd:doctype name=“Index”> tsd:logical tsd:contentclosed</tsd:content> </tsd:logical> </tsd:doctype> tsd:adminInfo tsd:versionTSD4.2</tsd:version> tsd:created2005-07-28T09:15:52.912+01:00</tsd:created> tsd:modified2006-03-24T09:22:38.980+01:00</tsd:modified> </tsd:adminInfo> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name=“Index”> <xs:complexType mixed=“true”> <xs:choice minOccurs=“0” maxOccurs=“unbounded”> <xs:element ref=“subIndex”/> </xs:choice> <xs:attribute name=“id” type=“xs:string”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“type” type=“xs:string” use=“required”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“subType” type=“xs:string”/> <xs:attribute name=“order” type=“xs:string” use=“required”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“available” type=“xs:integer” use=“required”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“tot” type=“xs:integer” use=“required”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“note” type=“xs:string”/> <xs:attribute name=“rif” type=“xs:string”/> </xs:complexType> </xs:element> <xs:element name=“subIndex”> xs:complexType xs:simpleContent <xs:extension base=“xs:string”> <xs:attribute name=“id” type=“xs:string”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ tsd:text/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“subType” type=“xs:string”/> <xs:attribute name=“available” type=“xs:integer” use=“required”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“tot” type=“xs:integer” use=“required”> xs:annotation xs:appinfo tsd:attributeInfo tsd:physical tsd:native tsd:index tsd:standard/ </tsd:index> </tsd:native> </tsd:physical> </tsd:attributeInfo> </xs:appinfo> </xs:annotation> </xs:attribute> <xs:attribute name=“note” type=“xs:string”/> <xs:attribute name=“rif” type=“xs:string”/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:schema> but this query: declare namespace xs=“XML Schema” declare default collation “collation?language=it;strength=secondary” let $q := for $a in input()/Index[@type=“author” and @tot>0] sort by (./@order ascending) return $a takes about 15 seconds (and involves 6736 documents). This is too long! I guess this is not normal and we missed something! Any kind of tuning suggestion is welcome. Thanks in advance, Pam

Hi Pam, I think the collation declaration is the culprit ! The purpose of having indexes is to optimize search and sort. If you then specify an alternate language and strength from when the index was generated, I guess this will force Tamino NOT to use the know indexes. The solution, I think, would be to remove the collation etc from the query and instead set these special properties in the schema. regards Finn

Hi Finn! I have removed the collation etc from the query and this query: declare namespace xs=“XML Schema” for $a in input()/Index[@type=“author” and @tot>0] sort by (./@order ascending) return $a takes only four seconds! Please can you explain how to set these special properties in the schema? Thanks in advance! Pam

Hi Pam, These properties are in the “collation” properties in the locical properties of each field if you use the schema editor. BTW I’m almost willing to make a bet that the four seconds responsetime is mainly cause by your IE-browser building up a DOM tree with the 6736 docs :wink: So - do you really need the full resultset ? Finn

Hi Finn! We need the full ordered resultset in our application because we have to browse it. Perhaps, in future, we’ll change method! Thank you very much for your precious help! Pam

Why not ask for bits of the result ie. 100 at the time ? Tamino has excellent support for this with the filter [position() >100 and position()<200] Finn

Hi Finn, what do you mean? Our really xquery is like this: declare namespace xs=“XML Schema” let $q := for $a in input()/Index[@type=“author” and @tot>0] sort by (./@order ascending) return $a let $name := “BONIFACIUS” let $offset := 10 let $termPosition := 0 let $pos := xs:integer(count($q[./@order < $name])) + 1 let $delta1 := min(($pos - $termPosition, count($q))) let $startPos := min (( max((1, $delta1)),count($q)-$offset+1 )) let $delta2 := max(($startPos,$pos + $offset -1 - $termPosition )) let $endPos := max( (min((count($q),$delta2)),$offset) ) return $q[position() >= $startPos and position() <= $endPos] It orders all document Index[@type=“author” and @tot>0] and returns 10 of these; the first is that whose @order is the nearer to “BONIFACIUS”. Have you any suggest? Thanks in advance! Pam

OK - I lost that bet :wink: Isn’t there a way of filtering the the “vicinity” of BONIFACIUS (is that a constant???)directly in the query ? I would guess that the post-selection done with the 6700 documents would be relatively timeconsuming ! Another question why do you use the let $q := for $a … instead of just let $q := input()/Index … you don’t appear to be using $a for anything ? Finn

Hi Finn, your observation is right! “BONIFACIUS” is not a costant. Excuse me: do you want suggest that there is another way of filtering “vicinity” of “BONIFACIUS”, and of the other following ordered nine elements, directly in the query? Regards, Pam PS: Can I explain another question? I have set collation directly in the schema but the order of the result is changed!!! For example the terms that begins with "

Hi Pam, In danish the letter "

Hi Finn, you have right, but the result must displayed exactly 10 record so, even if I search for the last term in Index collection, the result must be the last 10 records (or all the record if lower than 10). For example, if I search for “ZY”, the result is: [color=“darkblue”] [size=“9”]Zillio, Francesco, f. 1682-1692 Zmajevic, Andrija, arciv. di Antivari, m. 1694 Zuccoli, Ludovico, sec. XVI-XVII Zuccolo, Vitale, O.S.B., 1556-1630 <Index available=“0” id=“@7665” note=“Conte di Miranda [IAM].” order="ZUNIGA, JUAN DE, VICER

Hi Pam I still believe the sorting can be fixed in the schema - if you take the time to have a deep look in the documentation. Regarding BONAFACIUS I came to think of an example the XQUERY guru Harald gave for finding the hits before and after if there was no exact match. If you search for “hong” but only “hang” and “hung” is present. Perhaps this kind of approach can be used ? Finn let $searchterm := “hong” let $match := input()//a[.= $searchterm] return if ($match) then $match else let $lower := max (input()//a/text()[. < $searchterm]) let $upper := min (input()//a/text()[. > $searchterm]) return input()//a[. =($lower, $upper)]

Hi Finn! I have not been yet able to find in the documentation what you suggest. …I have some problems for the definition of the collation (see my new topic if you want!). Can you help me? Thanks for all, Pam