Fast Mass Loader

Hi:

I am using the command line fast mass loader (inoxmld). I see following issues -

When I try to load 1M XML document, it seem to work fine. However, when I tried to load the same data as an 10 MB XML document (just increase the number of purchase orders in my XML file by order of magnitude), I get the following error -

<ino:message ino:returnvalue=“8710” ino:documentnumber=“1”><ino:messagetext ino:code=“INOXPE8710”
nvalid token found or document incomplete</ino:messagetext>ino:messagelineLine 30970, Column 39:
nvalid token found or document incomplete</ino:messageline></ino:message>
<ino:message ino:returnvalue=“0”>ino:messagelineXML mass loading completed, number of documents
rocessed 1, loaded 0, rejected 1</ino:messageline></ino:message>
ino:message

This tells me my document is not well-formed. So I ran it through a parser in a seperate java program and it looks fine.

I load the same document by stripping out the ino:object and ino:request tags and it works fine using the Java client API. Also, I can load it using “Tamino Interactive Inteface”.

Also, the “inoxmld” keeps on crashing from time to time.

Any thoughts ???

Hallo,
be aware that in Version 2.3 the XML mass loader is provided as a test version and is not supported(to be found in the readme.txt). XML documents larger than 1 MB may cause problems.
In the upcoming version 3.1 this bug is fixed and the then called “DataLoader” is an offical and supported tool.
Regards
Uli

Hi,

the massload utility inoxmld is for loading masses of xml documents, not for loading one (or a few) massive large xml documents:

The massload utility inoxmld can be used to load masses of xml documents into Tamino, e.g. 7000 documents with a total size of 230 MB in one file.
I set up a 10 GB Tamino DB containing 155000 documents with inoxmld and it worked fine. It’s usually the document size that causes problems:

In general IT IS NOT A GOOD IDEA to use large xml documents (e.g. 10 MB). Using one large xml document (instead of many small documents) is bad -XML / database design and has many disadvantages:

- Time to retrieve the whole documents is high
- time to update the document is high
- amount of memory needed is high (try to view a 10MB xml document in Internet Explorer and look at the amount of memory this operation consumes…)
- locking prevents other users from working: usually a user will be changing only a part of a very large document. Other users cannot look at / change other parts while the document is locked.

I strongly recommend to change the XML design because one runs into serious trouble when working with very large documents. Here an example how several schemas instead of one big can reduce problems:

OLD XML design:

1 schema, 1 document stored in the whole database, size of the document: 30 MB
this is really bad because the whole database contains just one document !!

NEW XML design:
4 schemas, 15000 documents
this means by using 4 schemas instead of just one, the number of documents increased from 1 to 15000 ! This is a much better database design. Who needs a database / XML server if there are only a few documents stored in the whole database / XML server ?

Of course one might argue that one doesn’t need an XML server (but can use a relational DB instead) if one has to use several schemas anyway. Well, between the number of tables you need for a relational DB and the number of schemas needed for Tamino are BIG differences as Tamino needs much less schemas than a RDB needs tables. Thus Tamino’s performace is much
better (especially when the xml structure is complex) than any RDB (and Tamino is much easier to handle of course).


I hope this prevents others from running into trouble in the future,

Best regards,


Jan Harmsen
Technology Consultant
Partner Engineering