Data Loader : Loading 800K xml files

Couturier · May 7, 2003, 3:40am

I am using Tamino 3.1 and am trying to create a database with the Reuters Corpus which consists of 800 000 separate xml files. The Data Loader only seems to load documents one at a time.

Is ther another utility that I could use, or am I just using the Data Loader in the wrong way?

afr · May 7, 2003, 2:28pm

hello couturier,

i assume that your observation is right, if you feed the java loader multiple files it will load them one at a time.
for the scenario you are describing, it is probably best to pre-process all your individual files and put them in the mass loader (inoxmld) format.
this looks like this (i hope the bulletin boards doesnt turn any : colons into smilies)

<?xml version="1.0" encoding="utf-8" ?>
<ino:request xmlns:ino=“http://namespaces.softwareag.com/tamino/response2”>
<ino:object ino:docname=‘’>
…
</ino:object>
<ino:object ino:docname=‘’>
…
</ino:object>
</ino:request>

if you want to determine the format yourself, simply unload some documents, either with the inoxmld, or with the java loader, there you have to specify the format -loadReqest and you can specify a query to limit the amount of files unloaded.

the inoxmld mass loader is very efficient for initially loading a database with large amounts of documents.

hope this helps.

andreas f.

afr · May 7, 2003, 2:31pm

silly smilies, just as i assumed.

the ino:object is of course an i n o : o b j e c t .

i’ll try again here:

  
<?xml version="1.0" encoding="utf-8" ?>
<ino:request xmlns:ino="http://namespaces.softwareag.com/tamino/response2">
<ino:object ino:docname=''>
   <PAYLOAD>...</PAYLOAD> 
</ino:object>
<ino:object ino:docname=''>
 <PAYLOAD>...</PAYLOAD> 
</ino:object>
</ino:request>

Couturier · May 8, 2003, 10:27pm

Thanks for the help, Andreas

Once I understood that PAYLOAD was meant to represent my data rather than a container for my data, things worked great!

Topic		Replies	Views
Loading multiple documents in one XML Tamino	4	4096	April 2, 2021
Fast Mass Loader Tamino	3	5401	April 2, 2021
Problem with Data Loader Tamino	4	3147	April 2, 2021
BulkLoad? Tamino	5	3670	April 2, 2021
inoload using multiple files as input Tamino	3	4181	April 2, 2021

Data Loader : Loading 800K xml files

Related topics