Cursor issue and Tamino Performance

Hello everyone,

I really need the help of some experienced developers. I really would appreciate it. I have already successfully programmed a class with a connection to Tamino server. Just the XQuery solution is important to me.

So far it works fine with a very small document (5 nodes) but I get a “INOXYE9291, Transaction aborted because it has taken too long” for my other database, which has about 200,000 nodes.

I am still working on the cursor and hope this will help me out.

My first experience was the cursor won’t work without a localTransactionmode at all.

E.g. if case of the smaller DB with 5 nodes, it would run fine with a cursor value about 5 or higher. But if the cursor value is lower I get an exception.

TResponse response = accessor.xquery(xquery, 2);
</pre><BR><BR>The only solution was to add this line of code:<BR><BR><pre class="ip-ubbcode-code-pre">
TLocalTransaction tloc = connection.useLocalTransactionMode();
</pre><BR><BR>and then it works fine but as I mentioned before just with the smaller DB.<BR>The bigger database gets everytime a INOXYE9291 exception at this line:<BR><BR><pre class="ip-ubbcode-code-pre">
TResponse response = accessor.xquery(xquery, 5000);
</pre><BR><BR>(I have already increased the maximum transaction duration to 900 s)<BR><BR>I expect a result with 80809 recordsets.  I think it makes sense to set the cursor value to 5000.  I have inserted my java code and hope you can help me.  I know the code is still very bad regarding the exceptions.  Maybe I miss something.<BR><BR><BR><pre class="ip-ubbcode-code-pre">
package perfanx;

import com.softwareag.tamino.db.api.accessor.*;
import com.softwareag.tamino.db.api.common.*;
import com.softwareag.tamino.db.api.connection.*;
import com.softwareag.tamino.db.api.objectModel.*;
import com.softwareag.tamino.db.api.objectModel.dom.*;
import com.softwareag.tamino.db.api.response.*;
import org.jdom.*;
import org.jdom.input.*;
import java.io.*;
import com.softwareag.tamino.db.api.io.TStreamWriteException;
import com.softwareag.tamino.db.api.objectModel.sax.TSAXObjectModel;
import com.softwareag.tamino.db.api.objectModel.sax.*;

public class XQ_Tamino_Exp_6 {

  private TXMLObjectAccessor accessor = null;
  private TConnection connection = null;

  public XQ_Tamino_Exp_6(String query, String databaseURI, String collection) throws
      TConnectionException,
      TXQueryException, TStreamWriteException, TIteratorException,
      TNoSuchXMLObjectException {

    TConnectionFactory connectionFactory = TConnectionFactory.getInstance();
    connection = connectionFactory.newConnection(databaseURI);

    TXQuery xquery = TXQuery.newInstance(query);

    TXMLObjectAccessor accessor = connection.newXMLObjectAccessor(
        TAccessLocation.newInstance(collection), TDOMObjectModel.getInstance());
    TLocalTransaction tloc = connection.useLocalTransactionMode();
    TResponse response = accessor.xquery(xquery, 5000);
    TXMLObjectIterator iterator = response.getXMLObjectIterator();

    //Output as XML
    OutputAsXML(iterator);

    connection.close();
  }
}



Thank you very much,

Houman Khorasani
University of Wisconsin Platteville

hi again,



as explained in my previous post: please compute the time you have PER NODE. perhaps you may want to add a small routine that checks whether the time per node is distributed fairly evenly and it takes x milliseconds for the very first, the 1000th and the 150000th node, or whether the first node only takes 1ms, and the 150000th node takes 100ms.
there are many places in the complex application architecture that you are looking at where you may either be loosing performance in general, or where a degradation may occur over time / with every additional node. you will need to determine more specifically which piece of the processing chain is responsible.

you have not shared the explicit use case or the scenario for which you are building your scenario, but, stated very briefly what you are doing is this:
in a query to the database, you are ‘ordering’ an appearently rather large result set. this you get as a raw ascii (or more precisely utf-8) stream of characters. with the DOM object model, you are undertaking an enormous(!) effort to parse and analyze that character stream and turn it from raw stream into numerous (java-) objects. please be aware that it is not totally uncommon that the object tree that you generated out of you character stream consumes 10 times as much memory as your raw data stream. this is fairly independend of the DOM implementation or the programming language.
once you have done this fairly expensive deed, you simply take these expensive objects and

OutputAsXML(iterator);

reverse the process.

depending on what you want to do with the result set, you may want to consider


  • using a SAX based object model
  • using a Stream Accessor


you may want to read up in the documentation about the pro’s and con’s of the stream accessors.

hope this helps!

andreas f.

Hello Andreas,

Thank you very much for your response.



I would like to refer you to my other posting about the scenario. I have explained the structure and the XQuery statement there:
http://tamino.forums.softwareag.com/viewtopic.php?p=3429

Besides I have a Pentium M 1.7 Ghz and 1 Gb RAM.

quote:

in a query to the database, you are ‘ordering’ an appearently rather large result set. this you get as a raw ascii (or more precisely utf-8) stream of characters. with the DOM object model, you are undertaking an enormous(!) effort to parse and analyze that character stream and turn it from raw stream into numerous (java-) objects. please be aware that it is not totally uncommon that the object tree that you generated out of you character stream consumes 10 times as much memory as your raw data stream. this is fairly independend of the DOM implementation or the programming language.
once you have done this fairly expensive deed, you simply take these expensive objects and

OutputAsXML(iterator);</pre> reverse the process. <BR><HR></BLOCKQUOTE><BR><BR>I do not even get so far.  I debugged the class and it crashes already after 900 seconds at this line:<BR><BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR><BR>TResponse response = accessor.xquery(xquery, 5000);<BR><HR></BLOCKQUOTE><BR><BR>I already thought about of using the TSAXObjectModel instead of the TDOMObjectModel.  I just took the SAX example from Tamino itself.  But it doesn't parse at all.  If you are familiar with SAX and the TSAXObjectModel of Tamino.  Could you please help me with these classes if they are alright?<BR><BR><A HREF="http://www.houmie.com/research/Message.java.txt" TARGET=_blank>Message (Message.class)</A><BR><BR><A HREF="http://www.houmie.com/research/DocumentDefaultHandler.java.txt" TARGET=_blank>DocumentDefaultHandler (docDefHandler)</A><BR><BR><A HREF="http://www.houmie.com/research/ElementDefaultHandler.java.txt" TARGET=_blank>ElementDefaultHandler (elDefHandler)</A><BR><BR><A HREF="http://www.houmie.com/research/MessageDefaultHandler.java.txt" TARGET=_blank>MessageDefaultHandler</A><BR><BR><pre class="ip-ubbcode-code-pre">
TSAXObjectModel saxObjectModel = new TSAXObjectModel( "MessageSAXObjectModel" , Message.class , Message.class ,	docDefHandler , elDefHandler );

TXMLObjectModel.register( saxObjectModel );
accessor = connection.newXMLObjectAccessor( TAccessLocation.newInstance( collection ) , saxObjectModel );

TXQuery xquery = TXQuery.newInstance(query);

TResponse response = accessor.xquery(xquery,1000);

TXMLObjectIterator iterator = response.getXMLObjectIterator();




I don’t know but somehow I got stuck. I already evaluated several open-source in-memory Solutions. For example SAXON works fine with my XQuery statement and returns 80809 datasets in 8132 ms (or 8 secs) and they do not even have advanced features like cursoring.

Tamino can’t do that in 900 secs…I thought I might have a leakage in my code, but it seems alright.

At the same time I am also confused. Tamino should be able to handle databases in small, medium and large sizes. The one I have is half the size of a ‘small’ one.


My guess is the problem could be solved by TSAXObjectModel or it is the Java VM Memory allocation.
SOfar every XQuery solution had the same issue. The Java VM memory must be 10 times bigger than the document. Maybe I could solve the problem if I increase it for Tamino…

Regards

Houman Khorasani
University of Wisconsin Platteville

Hi Houman,

when i was writing ‘figure out where your application is taking its time’ you are saying that that is a good point, but you do not seem to be following up on that thougt.
please try to find out these times:


  • time from start until the connection to the database is completed
  • time from query until the first document is returned.
  • time to process the first document
  • time to process documents 4999, 5000 and 5001
  • time to process documents 79999, 80000 and 80001 (increase max transaction duration to 99999 for that test)
  • (please keep an eye on memory consumption of the JVM during the last two tests.)

i am sure that from the numbers you get there you will see that

  • yes, it does take some time to set up a connection to a database
  • once the first document is there, the time will be fairly constant in the beginning
  • the time will degrade after some time (that is, when the java VM is running out of memory and will do significant garbage collecting

admitted, retrieving data from a full scale database will not be as fast as retrieving the same data from an in-memory situation, but the comparison is far from fair (or we’ll continue the comparison if 1000 parallel users are posting queries at both tamino and saxon in parallel and do concurrent, transactional updates :-).



but, perhaps we are on a completely false track alltogether. do you - from the viewpoint of your application - even need to get all those 80000 documents in ONE GO and in the scope of ONE SINGLE transaction?
and then, what do you do with them? do you simply ‘read’ them, or do you intend to update some of them / all of them? are the updates logically interlinked so that the manipulation of the 80000 documents NEEDS to be in the scope of one database transaction?
perhaps you can read the documents outside the scope of a transaction (without the cursoring, do not use the LocalTransaction) or in considerably smaller and more handleable chunks, and do the updates in a separate processing step - within a transaction.
or, if all manipulations of all doc’s are one transaction after all, can you justify the value of 900 seconds for the max transaction duration? how many parallel users are there, how many parallel threads may wish to manipulate the same data you are keeping a lock on?

again: we really don’t know enough of your complete usage scenario to provide really useful help. as i am suggesting above, maybe a scenario that neither you nor anybody else here has been able to come up with will be the perfect answer to your problems.
continuing that thought: i am not sure whether simply switching to the SAX object model will be THE answer. it all pretty much depends on what you finally want to do with your data (see above), as you may run out of the max. transaction duration with that object model as well.


sorry, not all that many answers but basic food-for-though,

andreas f.

Hello Andreas,

Thank you for your response. I think there is a misunderstanding here. I will now try to explain my scenario more in detail.

I do not have 80000 documents in my database.
I have one big document with 234964 complex elements , which wraps three simple elements.



My XQuery statement is the following:

for $wb in doc("wisc_berkeley.xml")/data/row
where $wb/dat='20000801000000'
return
	<row>
		{ $wb/tick }
		{ $wb/dlay }
	</row>
</pre><BR><BR>Return all the <tick> and <dlay> from August 1, 2000.<BR><BR>Thats all. No updating or other requests.   The result from the above XQuery statement will be exactly 80809 <row> tags.<BR><BR>The connection time to the Tamino DB is 2864 ms.<BR>The query time can not yet measured because it crashes at this line:<BR><BR><pre class="ip-ubbcode-code-pre">
TResponse response = accessor.xquery(xquery, 5000);
</pre><BR><BR>You are right about the concurrent/Parallel access to the Tamino vs in-memory solution.  Tamino 'should' be better in this case.  But I am still disappointed about the fact that Tamino can't return the result what I want.  An XML database should be able to return any kind of XQuery result.  It shouldn't matter how big the result is.  <BR><BR>Regarding my example, I need all the <tick> and <dlay> from August 1, 2000.  There is no way to take smaller chunks out of this.  A database must be able to handle any kind of query and return the results, even in a worst case.<BR><BR>What will happen if I take this XQuery statement?<BR><BR><pre class="ip-ubbcode-code-pre">
for $wb in doc("wisc_berkeley.xml")/data/row
return
	<row>
                { $wb/dat }
		{ $wb/tick }
		{ $wb/dlay }
	</row>



Now the resultset is about 234964 elements; each with three simple elements. This would be the worst case.

Compare this to a relational database, does it matter there how big the result from a query is? It doesn’t. I can iterate the resultset object and fetch each row after another.

Regards

Houman Khorasani
University of Wisconsin Platteville

If I use the Tamino interactive Interface and limit the result to 16 (!) I get the result after 54 seconds.

Result


Regards

Houman Khorasani
University of Wisconsin Platteville

hello houman,

thank you for clarifying the point about your document design. that, in fact, explains a lot.
to say it out straight away: with this data design, no matter what approach, object model or transaction duration you choose, you will not be able to get satisfactory results.
in a relational database, your approach is like haveing a single ‘row’, with some 80000 columns.

please change your data design to where you have 80000 unique documents such as:

<row>
   <dat>...
   <tick>...
   <dlay>...
</row>


and soon you will be one happy tamino user – I am sure!

gruss,
andreas f.

Hi Andreas,

Sorry for getting so late back to you. I had a lot of work to do with other XQuery solutions. Well I was really surprised how your solution should work and I tried it out.

quote:


please change your data design to where you have 80000 unique documents such as:

<row>
   <dat>...
   <tick>...
   <dlay>...
</row>
</pre><BR>and soon you will be one happy tamino user -- I am sure!<BR><HR></BLOCKQUOTE><BR><BR>First of all I had to remove my root tag called <data> to get the structure as you have suggested.<BR><BR><pre class="ip-ubbcode-code-pre">
<data>
  <row>
    <dat>...
    <tick>...
    <dlay>...
  </row>
</data>
</pre><BR><BR>The problem is now that this new XML is not valid.<BR><pre class="ip-ubbcode-code-pre">
<row>
   <dat>...
   <tick>...
   <dlay>...
</row>
<row>
   <dat>...
   <tick>...
   <dlay>...
</row>
<row>
   <dat>...
   <tick>...
   <dlay>...
</row>



It needs an root tag which contain all the other 's.
I am not able to write an DTD for this new structure and can’t upload the Schema into the Tamino. So please help me to figure out what exactly you have meant by that.
How can I write a DTD for this kind of the XML structure above?

Thank you very much,

Houman Khorasani
University of Wisconsin Platteville

hello houman,

what i was suggesting was a different way of looking at your data.
before, you had ONE single document that contained one root node with 80000 leaves on the first level.
what i was suggesting was having 80000 unique documents. the schema for that would contain a doctype of ‘row’ that contains three consequtive nodes ‘dat’, ‘tick’ and ‘dlay’. this, of course, would imply changes both when inserting and when processing or retrieving the objects. for inserting, namely this means not inserting one document, but 80000 single insert operations.
again, i do not know all to much about your scenario (where the data comes from, …). your new approach would either be your data-generating application change to distinctly inserting (and possibly even commiting) each occuring ‘row’ data instance. or, if the application generates a flat file which you later on insert into the database, have the application write the data stream in the so called ‘mass loader format’, which looks like this:

<?xml version="1.0" ?>
<ino:request    <!-- ino namespace HERE -->  >
<ino:object>
    <row> <dat>xxx1</dat><tick>...</tick><dlay>...</dlay></row>
</ino:object>
<ino:object>
    <row> <dat>xxx2</dat><tick>...</tick><dlay>...</dlay></row>
</ino:object>
<ino:object>
    <row> <dat>xxx3</dat><tick>...</tick><dlay>...</dlay></row>
</ino:object>
...
</ino:request>  



Then, use the inoxmld mass loader or the java loader (see documentation on data loading utilities) to load the resulting document. when the loading is complete, you will have not one document but 80000 unique documents in the database.
Afterwards, the handling of them on the querying side is nearly the same as you had before (in your original post the actual query is missing :-(.

I hope that this brings you one step closer towards efficiently processing the data you have at hand. if the handling on either the data producing or the data consuming side is not clear, please do let me know. I don’t know about your or your projects status at UofW, but perhaps alternatively you may want to consider the consulting services from software ag.

hope this helps,

andreas f.

Hi Andreas,

Thanks for the example. Now I understand how it should be. I am very close to make it work. :slight_smile: But I still need your help.

First of all I was thinking how to change the structure in the way you have suggested. So I decided to use XSLT to do that:

<?xml version="1.0"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:template match="data">
	<ino:request xmlns:ino="http://namespaces.softwareag.com/tamino/response">
		<xsl:apply-templates select="row"/>
	</ino:request>
  </xsl:template>

  <xsl:template match="row">
	<ino:object xmlns:ino="http://namespaces.softwareag.com/tamino/response">
		<row>
			<xsl:apply-templates select="dat"/>
			<xsl:apply-templates select="tick"/>
			<xsl:apply-templates select="dlay"/>
		</row>
	</ino:object>
  </xsl:template>

  <xsl:template match="dat">
  <dat>
    <xsl:value-of select="."/>
  </dat>
  </xsl:template>

  <xsl:template match="tick">
  <tick>
    <xsl:value-of select="."/>
  </tick>
  </xsl:template>

  <xsl:template match="dlay">
  <dlay>
    <xsl:value-of select="."/>
  </dlay>
  </xsl:template>
</xsl:stylesheet>
</pre><BR><BR><BR>So my structure got like this here:<BR><BR><pre class="ip-ubbcode-code-pre">
<?xml version="1.0" encoding="UTF-8"?>
<ino:request xmlns:ino="http://namespaces.softwareag.com/tamino/response">
 <ino:object>
   <row>
     <dat>20000801000000</dat>
     <tick>17328</tick>
     <dlay>48966</dlay>
   </row>
 </ino:object>
 <ino:object>
 ...
 </ino:object>
<ino:request>
</pre> <BR><BR>So far it went perfect.  Then I realized I need a DTD for that:<BR><BR><pre class="ip-ubbcode-code-pre">
<!ELEMENT dat ( #PCDATA ) >
<!ELEMENT dlay ( #PCDATA ) >
<!ELEMENT tick ( #PCDATA ) >
<!ELEMENT row ( dat, tick, dlay ) >
</pre><BR><BR>Then I opend Tamino Schema Editor and imported the DTD and created a DocType called "row" and upload it to the Tamino Server.<BR><BR><IMG SRC="http://www.houmie.com/research/dtd.GIF">      <BR><BR>Then I started to transfter the data:<BR><BR><pre class="ip-ubbcode-code-pre">
C:\Tamino\Tamino 4.1.4.1\bin>inoxmld database=ippm user=Houmie collection=wisc_b
erkeley/row input=c:/wisc_berkeley_out.xml norejects log=wisc.xml
</pre><BR><BR>Problem 1) It complained that <ino:request> has no Schema.  I decided to solve this problem quite fast and just deleted the Namespace declaration from the <ino:request> in my source file.<BR><BR>So it got changed from:<BR><BR><pre class="ip-ubbcode-code-pre">
<?xml version="1.0" encoding="UTF-8"?>
<ino:request xmlns:ino="http://namespaces.softwareag.com/tamino/response">
 <ino:object>
  ....
 </ino:object>
</ino:request>
</pre><BR><BR>to:<BR><BR><pre class="ip-ubbcode-code-pre">
<?xml version="1.0" encoding="UTF-8"?>
<ino:request>
 <ino:object>
  ....
 </ino:object>
</ino:request>
</pre><BR><BR>Then I tried it again and it seemed to work fine.  After 30 minutes I got this error message:<BR><BR>Problem 2)<BR><BR><pre class="ip-ubbcode-code-pre">
<?xml version="1.0" encoding="utf-8" ?>
<ino:response xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/">
  <ino:message>
    <ino:messageline>Tamino Data Loader v4.1.4.1 - Copyright (c) Software AG</ino:messageline>
    <ino:messageline>Loading from c:/wisc_berkeley_out.xml to Tamino database ippm</ino:messageline>
    <ino:messageline>Start: 2004-06-03T15:17:33</ino:messageline>
    <ino:messageline>Loading of documents 1-10483 triggered</ino:messageline>
    <ino:messageline>Loading of documents 10484-20858 triggered</ino:messageline>
    <ino:messageline>Loading of documents 20859-31233 triggered</ino:messageline>
    <ino:messageline>Loading of documents 31234-41608 triggered</ino:messageline>
    <ino:messageline>Loading of documents 41609-51979 triggered</ino:messageline>
    <ino:messageline>Loading of documents 51980-62304 triggered</ino:messageline>
    <ino:messageline>Loading of documents 62305-72677 triggered</ino:messageline>
    <ino:messageline>Loading of documents 72678-83081 triggered</ino:messageline>
    <ino:messageline>Loading of documents 83082-93533 triggered</ino:messageline>
    <ino:messageline>Loading of documents 93534-103908 triggered</ino:messageline>
    <ino:messageline>Loading of documents 103909-114283 triggered</ino:messageline>
    <ino:messageline>Loading of documents 114284-124658 triggered</ino:messageline>
    <ino:messageline>Loading of documents 124659-135033 triggered</ino:messageline>
    <ino:messageline>Loading of documents 135034-145405 triggered</ino:messageline>
    <ino:messageline>Loading of documents 145406-155780 triggered</ino:messageline>
    <ino:messageline>Loading of documents 155781-166249 triggered</ino:messageline>
    <ino:messageline>Loading of documents 166250-176624 triggered</ino:messageline>
    <ino:messageline>Loading of documents 176625-186999 triggered</ino:messageline>
    <ino:messageline>Loading of documents 187000-197374 triggered</ino:messageline>
    <ino:messageline>Loading of documents 197375-207749 triggered</ino:messageline>
    <ino:messageline>Loading of documents 207750-218124 triggered</ino:messageline>
    <ino:messageline>Loading of documents 218125-228499 triggered</ino:messageline>
    <ino:messageline>Loading of documents 228500-234964 triggered</ino:messageline>
  </ino:message>
  <ino:message ino:returnvalue="8599"><ino:messagetext ino:code="INOXME8599">Internal error</ino:messagetext></ino:message>
  <ino:message>
    <ino:messageline>COMMIT failed - data load aborted</ino:messageline>
    <ino:messageline>Elapsed time: 925 (s), finished: 2004-06-03T15:32:58</ino:messageline>
  </ino:message>
</ino:response>



It cancelled it because of the Error INOXME8599, which is “An internal error has occurred.”

Well what now? :frowning: Do you have any idea? Is my DTD bad? Is the DocType “row” correct? Does the Doctype need any adjustment in properties?

Its really complicated to prepare and load an XML file into Tamino. But what you’ve said makes sense and I have the feeling being quite close to the solution. :slight_smile: Do you have any idea?

Thanks again for your help,

Houman Khorasani
University of Wisconsin Platteville

hi houman,

your new data design looks just perfect. now if we can get your data loaded :wink:

ok. this error had been reported agains 4.1.1 databases. your loader log looks as if you had tamino 4.1.4 installed. are you perhaps still using an ‘old’ 4.1.1. database?
if this is the case, please load one single data item manually, for example using the interactive interface.

if this is not the case, please try either the inoxmld command line switch ‘concurrentWrite’, or the java loader. also, you may want to report this to the software ag customer support center.

then again: the error occurs after 925 seconds, do i remember correctly that your max transaction duration was set to 900 sec’s? try turning that screw, you now seem to have at least four times the amount of data.

by the way: with more than 200000 data items, i am fairly certain that your previous, single-document-approach was destined to fail.


don’t worry, we’ll get there!

gruss,
andreas f.

Hello Andreas,

I have the newest Tamino 4.1.4.1.

I tried the web data loader on Tamino Manager. But I wasn’t surprised as it failed. So I tried afterwards the Java loader. After less then one third of my data (total 24 mb incl. ino:object tags) the database got full and it stopped recording. ( I have the evaluation version with max 20 MB license)

Thats a lot of Overhead; it makes 6 MB XML file as big as 20 MB content!!!

But well atleast I was hoping that ONE query would work. And it works fine in Interactive Interface, WHEN I have enabled the limitation of the result to 16.



I still can’t run any simple queries on my data through the Java API. Still the same error. Tamino access failure (INOXYE9291, Transaction aborted because it has taken too long)

My code is right since I have made another database with just 5 xml tags and it works fine. Tamino can NOT handle bigger amount of data neither as a one document nor as multi document.

This might be possible in C++ or in other API’s but there is a problem with the Java API. I am wondering if Software AG has a testing team and went through some stress test with Tamino.

It is very frusterating. I have to exclude Tamino from my project. I hoped that a commercial product from SOftware AG would have more potential than a opensource one-man-show, but the product is far from being mature.

But I appreciate your help. You helped me a lot in a time where other users had already given me up. You are a very kind person.

Best Regards,

Houman Khorasani
University of Wisconsin Platteville