Exponential time for query processing / avoiding Tamino Post

Hi Tamino fans,

we are concerning about a Tamino query, related processing time and the interpretation of the ino:explain-command:

The query expression:
FpML[trade/swap/swapStream/calculationPeriodAmount/calculation
[notionalSchedule/notionalStepSchedule/currency=‘USD’ and
floatingRateCalculation/floatingRateIndex=‘USD-LIBOR-BBA’ and
dayCountFraction=‘ACT/360’]]

When we test the query with steps of 100.000 instances the processing time for the first 400.000 instances is linear but from 500.000 to 900.000 the processing time increases exponential. The results:

instances time [msec]
20000 1309
40000 2445
60000 3579
100000 6541
200000 11793
300000 17906
500000 43124
700000 98195
900000 127128

All fields we are using within the filter expression are indexed with the ‘standard’ option but the ino:explain-command always returns an ino:postprocessing=“TRUE”.

Has anyone got an idea why Tamino uses postprocessing or why the processing time increases exponential???

We are using a sun fire 880 with 4 GB RAM and two processors, buffer pool size is 1 GB.

Enclosed you will find the schema, the result of ino:explain and a sample instance.

Thanks in advance
Michael

Michael Pollecker

SAG Systemhaus GmbH
Niederlassung Darmstadt
Professional Services

Alsfelder Str. 15-19, D-64289 Darmstadt
Telefon +49 (6151) 92 31 28, Fax +49 (6151) 92 31 11
E-Mail: Michael.Pollecker@softwareag.com
Michael.Pollecker@partner.commerzbank.com
ino_explain.zip (7.6 KB)

Standard index means that Tamino remembers in which documents a particular value occurs.

So if your query is something like that

path[node1=‘xxx’ and node2=‘yyy’]

Tamino finds two lists of document IDs where node1 and node2 occur and returns you only common documents.

This technique seems to be fast enough ( from O(n) to even constant time, depend on what type of indexing is used, i don’t know it unfortunately).

Your query is a litte more difficult because of the nested conditions. Stardard indeces can’t be applied here straightly. That is why post-processor is involved in calculations.

The only thing I can advise you is to restructure your documents so that the queries you perform more often take less.

Alexander

Hi Alexander,

thanks for your answer! In the meantime we found out why tamino uses the postprocessor: It’s the cardinality > 1 of a node we query. In detail:

A query like
/A/B/C[D/E[F=‘xxx’ and G=‘yyy’]]
causes no (!) postprocessing if the cardinality of the nodes D and/or E is zero or one. If the cardinality of these nodes is > 1 the postprocessor is invoked.

Your suggestion changing the schema is not useful in our case because the schema is standardized.

regards
Michael

Yes, I wanted to explain you the same. Standard indeces can help only in the question: are there DOCUMETNS that have this value in this node or not. The world “document” is crucial.

Your query is more difficult exactly because of multiple cardinality of the nodes you said (by the way B and C as well). Otherwise the query could have been simplified to

/A/B/C[D/E/F=‘xxx’ and D/E/G=‘yyy’]

If you are not permitted to change the schema entirely, you could rely on open schema concept and add a little auxiliary node to it with standard index. For example, auxiliaryNode under calculation. It contains a combination of all values you need in the query.

Now this will be faster

FpML[trade/swap/swapStream/calculationPeriodAmount/calculation/auxiliaryNode=‘USD;USD-LIBOR-BBA;ACT/360’]

You can add the node when loading documents in Tamino and delete it if necessary when retrieving.

Of course, this approach is defensible if you have a very limited number of queries that should be performed fast.