index-only

Enric · September 29, 2004, 11:15pm

Hi,

I’ve read this post:
http://tamino.forums.softwareag.com/viewtopic.php?p=8947

I have a similar problem, the performance of some queries on my database is poor and query explain retrieve the element “XqcUnnestStdIdxScan” has the attribute “index-only” set to “false”. I supose that this because of the indexed element does not have multiplicity one. What does “index-only” attribute mean ? what would be the performance improvement if it was set to “true” ?

Another dubt is if the query response time depends on document size.

I use Tamino XML Server 4.1.4
Thanks in advance.

Thorsten1 · October 1, 2004, 8:56pm

Hi,

I?m not sure what you mean by ?index-only? attribute, because up to my knowledge there is no such an attribute in the XQuery explain output. By ?index-only? processing I mean that the XQuery processor can evaluate the predicates of a given query or subquery without accessing any XML document. For example assume the following query:

for $a in input()/bib/book where $a/@year =2000 return $a

The query retrieves all books written in 2000. Assuming a standard index on the ?year? attribute, the predicate in the where clause can be evaluated on the given index without considering any documents. Only for returning the result the XQuery processor has to retrieve the XML documents that contain the matching book elements. To give an example that can?t be evaluated index-only assume the following query that retrieves all books written by Dan Suciu:

for $a in input()/bib/book where $a/author[first = ?Dan? and last = ?Suciu?] return $a

Usually a book is written by multiple authors and therefore a book element contains several author elements. Assuming there is an index on the ?first? and the ?last? element, by accessing these indexes and building the intersection of the index access results we can determine the documents that hold a matching ?first? and a matching ?last? element. Unfortunately this does not mean that the documents also hold a matching ?book? element, since we do not know if the ?first? and the ?last? element belong to the same ?author? element. To verify this we have to scan all the documents we get from the intersection.

Since accessing XML documents can be expensive the performance of a query that can be processed index-only is the best you can get from the Tamino XQuery processor. You can check ?index-only? evaluation by verifying that the XQuery explain output does not contain any ?XqcSelect? or ?XqcSemiDJoin? element. If you have performance problems caused by the fact that the XQuery processor does not do ?index-only? processing you may have to rewrite your query. To give you some hints how to do that I would need your schema and your query. You also should consider moving to Tamino 4.2 since the XQuery processor of this version is far better in detecting queries that can be evaluated ?index-only?.

You also have asked how the results size may influence the query performance. Obviously a query that gives you the whole content of a collection holding many documents will be slow compared to a query that gives you only the number of documents stored in a collection. But on the other hand a query that gives you the number of occurrences of certain element in a collection is much slower than a query that gives you the first document in that collection. So I would that say that the query performance may depend on the result size. But the result size is far from being the primary factor that determines the performance of a query.

Best regards,

Thorsten

Enric · October 1, 2004, 9:29pm

thanks for you answer,
when i told you about the “index-only” attribute I meant the “count-only”, It was my error.
However you commnets have been very helpful.

Is there a relationship beetwen “count-only” attribute and index only processing ?

Best regards

Thorsten1 · October 1, 2004, 10:24pm

Hi,

?count-only = true? means that the represented scan operator retrieves the number of documents returned by an index access (or stored in a doctype/collection). Prerequisite for a scan operator with ?count-only=true? is that the given query can be processed ?index-only?. For example, the following query counts the books published in 2000:

count(for $a in input()/bib/book where $a/@year = 2000 return $a)

If there is an index on the year attribute, the query can be processed index-only. If further the restriction holds that each document in the bib doctype contains exactly 1 book element, we can evaluate the query by just counting the results returned by the index access. Due to that the XQuery explain output would contain an ?XqcUnnestStdIdxScan? element with the ?count-only? attribute set to ?true?.

Best regards,

Thorsten