join query performance

Tamino 4.1.4.3.
Query using “Interactive Interface”.

Join query is:

for $a in input()/x,
$b in (input()/y, input/z)
where
$a/v1 = “xx” and
$a/v2 = $b/v3 and
tf:containsText($b, “abc”)
return $a

Query produces 7 results (400 documents total in the DB) and takes 8 seconds to complete.

for $b in (input()/y, input/z),
$a in input()/x
where
$a/v1 = “xx” and
$a/v2 = $b/v3 and
tf:containsText($b, “abc”)
return $a

Changing the order for $a and $b in the query obviously produces the same result. The difference is, that this query is completed after 1 second duration.

What is the reason for the factor 8 performance difference?

Hello Erwin,

could you please tell us how many of the 400 documents are “x” documents, how many are “y” documents and how many are “z” documents?

Could you also please give us some clues as to the schemas used for these documents: particularly which nodes are indexed, and with which type of index?

Thanks,
Trevor.

Sorry Trevor, I just tried to simplify, thinking that the order of $a and $b should not make any difference.

70 (small) documents are “x” (size below 1KB)
70 documents are “y” (size between 20 and 100KB)
4 documents ar “z” (size between 20 and 100KB)
Standard Index for x/v1, x/v2, y/v3, z/v3
Text Index for “y” and “z” (Top-Level element).

Schemas for “y” and “z” are open content just defining the top-level element and indexed parts.

In fact the number of documents for “z” does not matter.

for $a in input()/x, $b in input()/y
is fast.

for $a in input()/x, $b in (input()/y, input()/z)
is slow even for non existing “z” documents.

Regards,
erwin

Hello Erwin,

I am not sure, but after reading your second posting I am wondering if perhaps the slower performance happens when the union of sequences “input()/y” and “input()/z” is assigned to the variable $b…?

Perhaps the easiest way to proceed is for you to post your schemas & documents, so that I can test with them too? (You could also e-mail them directly to me, if you don’t want to post them on the public forum. Please find my e-mail address in my profile.)

Thanks and regards,
Trevor.

Hello Trevor.

We are not allowed to provide you with customer DTDs, schemas, documents (technical aircraft manuals), and I do not have the time to build a testing environment for the problem.

But I am quite sure that it is easy to reproduce the behaviour. We encountered indeterministic performance behaviour with (complex) XQueries for several times and think that there are general problems with the handling of (complex) XQueries in Tamino. In addition there is no good way for us to analyse our queries (“{?explain?}” does not produce evaluable results).

A month ago we had a similar join query performance problem. In cooperation with SoftwareAG Austria we spent two days for building up a test database and for playing around with the query. NO RESULT.

Thanks,
erwin

The query below is the “real world” variant of the problem query.

declare namespace xf=“W3C XQuery 1.0 and XPath 2.0 Functions and Operators
declare namespace tf=“http://namespaces.softwareag.com/tamino/TaminoFunction

for $a in input()/XCms.XObject/ResourceInfo/MetaData/Data/metaAir/airLine,
$b in (input()/procedures, input()/systems-description)

where $a/usage/@aircraft = “B737-800” and
$a/usage/@manual = “OM” and
$a/usage/@type = “OM Part B” and

tf:getDocname(xf:root($b)) = $a/…/…/…/Variants/Versions[last()]/DataId and

tf:containsText($b, “emergency”)

return
{ $a/…/…/…/…/ObjectId }
{ $a }
</Result


Regards,
erwin