strange duration results need suggestions

I tested some X-Mark queries on Tamino 4.1.4.1 (xml starter kit) on a P4 1.60 GH with 256 MB ram and WinXP Pro SP1.
I found some results to be really strange and I need someone to clear me if there is some fault by my side or it’s normal like that.

I prepared a (not so) large db (to the limits of the starter kit license), “19.75 MB of total 20.00 MB”.

Then, I tried this query:

for $a in input()/site/closed_auctions/closed_auction
where $a/annotation/description/parlist/listitem/parlist/listitem/:text/emph/keyword/text()!=“”
return

but durations for ansers are many many times too high than expected

<ino:time ino:date=“27.10.2003” ino:time=“23:55:27.640” ino:duration=“278270” />
<ino:time ino:date=“28.10.2003” ino:time=“00:01:09.461” ino:duration=“305529” />

Consider that it answered with 46 (yes, fourty six) elements such as this:


that makes 6300+ milliseconds (more than six seconds!) for EACH element to be constructed.


Then I just stripped away the node constructor from the return, like this:

for $a in input()/site/closed_auctions/closed_auction
where $a/annotation/description/parlist/listitem/parlist/listitem/:text/emph/keyword/text()!=“”
return $a/seller/@person

and I got 46 answers such as this:
<xq:attribute person=“person97” />

but this time durations are normal:

<ino:time ino:date=“27.10.2003” ino:time=“23:45:05.395” ino:duration=“11166” />
<ino:time ino:date=“27.10.2003” ino:time=“23:45:30.601” ino:duration=“9083” />

that makes 220 milliseconds per element (still a lot, I think)

Why this great difference in nodes construction?
Why does it takes 5 minutes to give me an answer with 46 elements over a not so large database?

Many thanks for any help or suggest.

Aureliano

Aureliano,

I assume you loaded a single fairly large dokument into Tamino. This will not give you optimal performance.

Tamino works like a database. It is very good to find a matching document out of a large set of documents of the same type. So in the case of Xmark you should not store the data as is (which is all contained in a single document) but store the relevant parts in seperate doctypes. If only one document is stored Tamino always has to handle this single document and can not take advantage of the indices.

So you should create doctypes for
annotation
open_auction
closed_auction
item
person
category
(don’t take this for granted - I just had a quick look into the Data)

instead of site. This will result in much better performance. Please refer also to the “Performance Guide” in the documentation.

We did this a while ago (with an older version of Tamino) for a customer. I triggered the people involved, so you should get help and advice on this special benchmark.

So in short here are the steps I would suggest:
- create doctypes for the subtrees (e.g. item, person) based from the DTD
- create relevant indices
- create smaller documents for loading (person, item)
- formulate the Queries

The performance will be much better.

regards,

Timm

had a second look:

doctypes should be:
item
category
edge
person
open_auction
closed_auction

regars,

Timm

Hi, Aureliano,

the X-Mark is based on the assumption to use one
really large document for the whole data, which
is complete nonsense.

We have implemented the XMark in last year. At
that time we had Tamino 3 without XQuery. In
other words: We’ve been forced to emulate
content modification. Tamino was quite fast
anyways, even with Gigabytes of data. I would
personally assume, that Tamino 4 is even
quicker, as no intermediate servlet is
required.

I’ll attach the following for you:

- The schema we have used.
- A modified version of the xmlgenner for
creating sample data. The program runs
on windows.
xmark.tar.gz (291 KB)

Aureliano,
I did some testing yesterday.

Here is what I did:
- Split large document into smaller chunks (person, item, open_auction …)
- create seperate doctypes for each
- load into Tamino (using inoxmld)
- modify TII with Duration

my results (no special indexing/tuning done yet):

query a:
for $a in input()/closed_auction
where $a/annotation/description/parlist/listitem/parlist/listitem/:text/emph/keyword/text()!=“”
return

I get:
<ino:time ino:date=“30.10.2003” ino:time=“08:56:50.447” ino:duration=“851” />

Query b:
for $a in input()/closed_auction
where $a/annotation/description/parlist/listitem/parlist/listitem/:text/emph/keyword/text()!=“”
return $a/seller/@person

I get:
<ino:time ino:date=“30.10.2003” ino:time=“09:01:36.829” ino:duration=“761” />

Environment:
Tamino 4.1.4, DB created as medium
Win XP Prof
Dell Laptop, 800 MHz
Data generated with /f 0.2 = 23 MB of raw Data

looks much better, right?

If you are interested, contact me off list and I can send you the scripts and tools (perl script to split large document) as well as the schemas as a start.

timm.bohlmann@softwareag.com

regards,

Timm

Aureliano,

on the differences in the two queries:
It is likely due to an overflow in the internal document-cache of Tamino. This does not occur if you have smaller documents, as you can see in my results.
You can enlarge the document-cache by editing the registry at

…/Software AG/Tamino/servers/myDB/server parameters
(where mydb is the name of your database)

add the string value
XQuery_document_cache_size
set Value to 100 (means 100 MB), if this is not sufficient to 150 or 200.

but best advice is splitting.

regards,

Timm