Data Loader with entity references howto??

Hi all,

I’m trying to load some XML files that have some entity references into my collection, but I keep on getting errors. Any help would be greatly appreciated.

Here is what my input file looks like:

<?xml version=“1.0” encoding=“utf-8”>
<!DOCTYPE foo [



]>
ino:request
<ino:object [ino:docname=“foo1”]>


Alpha looks like α


</ino:object>
<ino:object [ino:docname=“foo2”]>


Beta looks like β


</ino:object>
<ino:object [ino:docname=“foo3”]>


Gamma looks like γ


</ino:object>
</ino:request>

I am evaluating Tamino for a pharmaceutical company and we use hundreds of entity references, and have thousands of XML files. Declaring the entities inline w/ each XML file is not an option :frowning:

Is there a way to import thousands of XML files w/ a lot of Entity References (ISOnum, isotech, isolat, isogrk…, all the isoz basically) into Tamino?

I don’t mind using a tool like Omnimark to create my input file the way it needs to be.

Thanks

The document above is not well-formed, so here is a version that should work. If you put all your internal entity references at the outset of a large input file for the mass loader, you should have no problems.


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo [



]>
ino:request
<ino:object ino:docname=“foo1”>


Alpha looks like α


</ino:object>
<ino:object ino:docname=“foo2”>


Beta looks like β


</ino:object>
<ino:object ino:docname=“foo3”>


Gamma looks like γ


</ino:object>
</ino:request>

Hi, :confused:

Thanks for your help, the version you posted works fine indeed. But when I apply it to my real data, I get some weird behavior.

When I include upto 72 entities (good.xml) it works fine, when I include entity #73 (bad.xml), I get an error:
ino:message
ino:messagelineTamino Data Loader v3.1 Copyright (c) Software AG</ino:messageline>
ino:messagelineLoading from input1.xml to Tamino database USANS</ino:messageline>
ino:messagelineStart: Fri Jan 03 20:24:56 2003</ino:messageline>
</ino:message>
ino:message
ino:messagelineInvalid input file format - tag <ino:request… not complete</ino:messageline>
</ino:message>
ino:message
ino:messagelineMass load aborted</ino:messageline>
ino:messagelineElapsed time: 11 (s), finished: Fri Jan 03 20:25:07 2003</ino:messageline>
</ino:message>

After investigation, I found out that it depends on the number of characters before the element is encountered. In my example the 73rd entity crossed the limit.

Is there a way to include resolve hundreds of entities?
ISOdia.ent has 14 entities
ISOnum.ent has 74 entities
ISOpub.ent has 84 entities
ISOtech.ent has 61 entities


ISOamsr.ent has 82 entities

My DTD include all of these entities and I have about 50 thousands files. (and that’s only one project, the lightest).

Can Tamino handle this?

Thanks for your help,
./Malick.

Attached:
Good.xml, a subset of the entities that allows the processing to happen and only 3/50000 entries
Bad.xml, added one entity to good.xml and the process failed
Input.xml, all of the entities that I need to be in the file, containes all ISO* entities used accross all the files and only 3/50000 entries.
usan_standalone.dtd, the DTD w/ all the entities copied within
usan2002_a.xml and usan2002_h.xml, sample data.
input1.xml (3.35 KB)

More attachments
usans.zip (120 KB)

Attached is the correct input file I would like to use, with all the entities referenced.

I get an error:
<?xml version="1.0" encoding="utf-8" ?>
<ino:response xmlns:ino=“http://namespaces.softwareag.com/tamino/response2” xmlns:xql=“XQL FAQ (XML Query Language - Frequently Asked Questions)”>
ino:message
ino:messagelineTamino Data Loader v3.1 Copyright (c) Software AG</ino:messageline>
ino:messagelineLoading from input1.xml to Tamino database USANS</ino:messageline>
ino:messagelineStart: Mon Jan 06 09:23:50 2003</ino:messageline>
</ino:message>
ino:message
ino:messagelineInvalid input file format - ino:request missing</ino:messageline>
</ino:message>
ino:message
ino:messagelineMass load aborted</ino:messageline>
ino:messagelineElapsed time: 16 (s), finished: Mon Jan 06 09:24:06 2003</ino:messageline>
</ino:message>
</ino:response>
input1.xml (14.9 KB)

Hi Malick,
you’ll need at least v3.1.2 Hotfix 3 to get DOCTYPE declarations to work with data loader.
There is a special format for this - you have to wrap the DOCTYPE declaration with:
ino:documentprolog
<![CDATA[
... your DOCTYPE declaration ...
]]>
</ino:documentprolog>

and place this directly after each <ino:object …> that needs this. AFAIK there is currently no way to define it “globally” for all documents in the data loader file.

All the best,
Hermann