Data Loader : Loading 800K xml files

I am using Tamino 3.1 and am trying to create a database with the Reuters Corpus which consists of 800 000 separate xml files. The Data Loader only seems to load documents one at a time.

Is ther another utility that I could use, or am I just using the Data Loader in the wrong way?

hello couturier,

i assume that your observation is right, if you feed the java loader multiple files it will load them one at a time.
for the scenario you are describing, it is probably best to pre-process all your individual files and put them in the mass loader (inoxmld) format.
this looks like this (i hope the bulletin boards doesnt turn any : colons into smilies)

<?xml version="1.0" encoding="utf-8" ?>
<ino:request xmlns:ino=“http://namespaces.softwareag.com/tamino/response2”>
<ino:object ino:docname=‘’>

</ino:object>
<ino:object ino:docname=‘’>

</ino:object>
</ino:request>


if you want to determine the format yourself, simply unload some documents, either with the inoxmld, or with the java loader, there you have to specify the format -loadReqest and you can specify a query to limit the amount of files unloaded.

the inoxmld mass loader is very efficient for initially loading a database with large amounts of documents.


hope this helps.

andreas f.

silly smilies, just as i assumed.

the ino:object is of course an i n o : o b j e c t .

i’ll try again here:

  
<?xml version="1.0" encoding="utf-8" ?>
<ino:request xmlns:ino="http://namespaces.softwareag.com/tamino/response2">
<ino:object ino:docname=''>
   <PAYLOAD>...</PAYLOAD> 
</ino:object>
<ino:object ino:docname=''>
 <PAYLOAD>...</PAYLOAD> 
</ino:object>
</ino:request>

Thanks for the help, Andreas

Once I understood that PAYLOAD was meant to represent my data rather than a container for my data, things worked great!