Tamino Data Loader v3.1

Hi all,

We are using Tamino 3.1.2.1 with Tamion Data Loader v3.1. We’ve found that the data loading process using Tamino is not fast enough. Although maybe our data’s problem, loading around 7000 testing documents, with size around 65Mb, cost me more than 1 day ( I guess, becoz I aborted after 20 hours, by that time, around 4000 documents are loaded ). The main cause maybe at a “fulltext” field with “text index”. And the loading speed is drop by time, that means, the first 1000 may need 1 hour, the 2nd 1000 may need 2 hour…etc.

I just want to know, is it usual? Is there anyone with similar data can perform the loading much faster ( so that indeed is my mistake )? Further, I’m trying to split my testing documents into multiple files, each contains 1000, I think it’s faster, isn’t it (but I need to issue the inoxmld command for multiple times )?

Best regards,
Lun

hello lun,

if you have 7000 documents that make up the total of 65MB, then you are right, the loading is slow (unless you have a reeeaaaly sloooow computer).
however, if each indivudual document is approx. 65MB AND heavily indexed, then this is something that should be looked at, but is not completely out of proportion.


some pieces of information about these topics:

* when inserting a document, this document is parsed inside the tamino xml server. independent of platform or programming language, it is not uncommon for a dom tree of an xml document to grow to 10 times the size of the original document. this amount of main memory is then consumed to insert a document.

* i cannot think of circumstances under which it would make sense to split up data files and invoke the mass loader multiple times (this is withOUT the concurrentWrite option).

if this information does not clarify your situation, please contact your support center for further analysis.

regards,
andreas f.

Hello Lun!

What is the platform you are running on?
Did you apply any hot fixes to your Tamino 3.1.2.1 installation?

Regards,
Thorsten

Hi,

Thanks for all the reply first…

The data is just “7000 documents make up 65Mb”, so it is very very slow, because although the machine is my development machine, it’s P4 1.7G Hz, 256Mb, Window 2000 Professional, which I think is good enough. Installed with Tamino 3.1.2.1, with hotfix h17. Each piece of data, I think, is around 6Kb, but would be “heavily indexed”.

I tried to split my data into 1000 by 1000 pieces, It did was loaded much much faster, the first several 1000 documents take only around several minutes to be loaded ( each ). But after 4000 documents has been loaded, the process hang again…

Would it be the problem of data? Because last time I loaded the data in a single xml, it also hang at around the 4xxx th…The data loader hang to an extent that when I press Ctrl-C, it show triggering rollback…this statement also hang. I then press Ctrl-C crazily, same result. I have to press Ctrl-Alt-Del to stop the process ( I saw that the inoxmld.exe is actually consuming “0%” CPU, strange!, while only “2%” or “3%” for inosrv.exe). Even though w2k show the process was stopped, I don’t think so because my machine is still very very slow. And I can’t access the collection, which said “The collection is using by another user”. Finally I need to reboot and I get my machine back. I try again to load the data, and a loop begins…

Best regards,
Lun

Hello Lun,
to make a little bit clearer, what’s going when you use the data loader :
The Data Loader (inoxmld.exe) takes your input file, chops it into nice pieces (around 1 MB each) and sends these pieces to the Tamino Server. It’s quite normal, that it uses very few CPU cycles. When you say, it is hanging, it just sits there and is waiting for answer from the server.
Pressing ctrl-C will show you the message “Triggering rollback …” that you saw, but it is only a notification, that your interrupt was accepted. “Real” action is done somewhat later, when the Data Loader client gots control back - to properly abort the data load, it sends a disconnect rollback message to the server. This will restore the original doctype contents and remove the exclusive lock that is held by the Data Loader session.
Killing the client, before it can send this message, means the lock will stay until the next database restart (where restore and unlocking will be done). Moreover, as you experienced, the server will not know about the “dead” client and continue what it’s doing.
You described, that your machine gots very slow and that inosrv consumes very little cpu. Most probably that means, that the machine is doing heavy I/O and/or paging. You can verify that by looking at the appropriate columns in the Windows task manager. If it’s not paging, but “real” I/O, please see my remark below about the buffer pool size.
If your machine is constantly paping, consider buying some more memory :slight_smile:

Back to the lock on the collection: You can remove such a lock “by hand” giving the command (also see the manual) "http://myserver/tamino/mydb?_admin=ino:CancelMassLoad(“mycoll/mydoctype”)
in a browser commandline (change to fit to your database etc.). This will allow you to access the collection again.
Rebooting your machine is not necessary, just restart your database

As to your performance problems:
please make sure, that you have enough temporary working space (lets say a factor 10-12 for your heavily index data, which makes about 800MB in your case) and that your buffer pool is large enough (default size is surely too small - use 60 MB or larger).
Breaking the input in pieces and loading it one by one will make the process slower with each run (always merging old and new index values).
You normally will not want to do this, but loading the data altogether.

If you suspect there is a problem with the data - have you tried to load documents 4000-5000 first ?

All the best,
Hermann

Hi Hermann,

Thanks for the very enlightening answer. But, how to increase the temporary work space? I can make 100% sure that my temporary work space lies on a “D:\data\tamino\mydb” directory and the available disk space is over 10Gb, but the temporary work space seems always 7Mb.

BTW, when I talking about “hang”, it’s really abnormal. The loader hang at somewhat “loading of documents 1003-1234 triggered”, I then went to have lunch (most probably around half hour). When I get back, it’s still there! So, I pressed Ctrl-C, and went to sleep for around another half hour. It’s still at “Triggering roll back”!

I can be sure that my hard disk is not in paging process, I can see the CPU usage is only 2 ( or 3 )%, which is consumed by inosrv.exe. All the other % owned by “System Idle”. For my PC became very very very slow, I then went to system management hub and issued a emergency shutdown of database, I tried normal shutdown and rollback shutdown first, both failed. Finally my PC became normal after my db was shutdown. I can simulate my above experience in a “clean” ( just after setup ) machine with same hardware config.

So, I will try increasing the temporary work space ( how? ), buffer pool size, and the journal space ( needed? ). Thanks for your useful advise.

Many thanks and regards,
Lun

Hi Lun,

some more remarks :
- you need not do anything with your temporary work space - it’s enough that it points to a directory with plenty of space.
- regarding your slow system, I still think, it is some I/O issue. When using the Windows Task Manager, you could try to select in View–>“Select Columns…” the following items:
Memory Usage, Page Faults, Virtual Memory Size, I/O Reads and I/O Writes. These should give you some hint about what’s going on. Clearly CPU cycles are not your problem.
When using “normal” data load (i.e. without concurrentWrite option), journal size should not be an issue.

All the best,
Hermann

I need dataloader version 3.1, can you please mail me the files required for tamino dataloader and java loader
thanks
abhishek
abhi@ee.columbia.edu