I am using Tamino 3.1 and am trying to create a database with the Reuters Corpus which consists of 800 000 separate xml files. The Data Loader only seems to load documents one at a time.
Is ther another utility that I could use, or am I just using the Data Loader in the wrong way?
hello couturier,
i assume that your observation is right, if you feed the java loader multiple files it will load them one at a time.
for the scenario you are describing, it is probably best to pre-process all your individual files and put them in the mass loader (inoxmld) format.
this looks like this (i hope the bulletin boards doesnt turn any : colons into smilies)
<?xml version="1.0" encoding="utf-8" ?>
<ino:request xmlns:ino=“http://namespaces.softwareag.com/tamino/response2”>
<ino:object ino:docname=‘’>
…
</ino:object>
<ino:object ino:docname=‘’>
…
</ino:object>
</ino:request>
if you want to determine the format yourself, simply unload some documents, either with the inoxmld, or with the java loader, there you have to specify the format -loadReqest and you can specify a query to limit the amount of files unloaded.
the inoxmld mass loader is very efficient for initially loading a database with large amounts of documents.
hope this helps.
andreas f.
silly smilies, just as i assumed.
the ino:object is of course an i n o : o b j e c t .
i’ll try again here:
<?xml version="1.0" encoding="utf-8" ?>
<ino:request xmlns:ino="http://namespaces.softwareag.com/tamino/response2">
<ino:object ino:docname=''>
<PAYLOAD>...</PAYLOAD>
</ino:object>
<ino:object ino:docname=''>
<PAYLOAD>...</PAYLOAD>
</ino:object>
</ino:request>
Thanks for the help, Andreas
Once I understood that PAYLOAD was meant to represent my data rather than a container for my data, things worked great!