Hi all,
I’m trying to load some XML files that have some entity references into my collection, but I keep on getting errors. Any help would be greatly appreciated.
Here is what my input file looks like:
<?xml version=“1.0” encoding=“utf-8”>
<!DOCTYPE foo [
]>
ino:request
<ino:object [ino:docname=“foo1”]>
Alpha looks like α
</ino:object>
<ino:object [ino:docname=“foo2”]>
Beta looks like β
</ino:object>
<ino:object [ino:docname=“foo3”]>
Gamma looks like γ
</ino:object>
</ino:request>
I am evaluating Tamino for a pharmaceutical company and we use hundreds of entity references, and have thousands of XML files. Declaring the entities inline w/ each XML file is not an option
Is there a way to import thousands of XML files w/ a lot of Entity References (ISOnum, isotech, isolat, isogrk…, all the isoz basically) into Tamino?
I don’t mind using a tool like Omnimark to create my input file the way it needs to be.
Thanks
The document above is not well-formed, so here is a version that should work. If you put all your internal entity references at the outset of a large input file for the mass loader, you should have no problems.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo [
]>
ino:request
<ino:object ino:docname=“foo1”>
Alpha looks like α
</ino:object>
<ino:object ino:docname=“foo2”>
Beta looks like β
</ino:object>
<ino:object ino:docname=“foo3”>
Gamma looks like γ
</ino:object>
</ino:request>
Hi,
Thanks for your help, the version you posted works fine indeed. But when I apply it to my real data, I get some weird behavior.
When I include upto 72 entities (good.xml) it works fine, when I include entity #73 (bad.xml), I get an error:
ino:message
ino:messagelineTamino Data Loader v3.1 Copyright (c) Software AG</ino:messageline>
ino:messagelineLoading from input1.xml to Tamino database USANS</ino:messageline>
ino:messagelineStart: Fri Jan 03 20:24:56 2003</ino:messageline>
</ino:message>
ino:message
ino:messagelineInvalid input file format - tag <ino:request… not complete</ino:messageline>
</ino:message>
ino:message
ino:messagelineMass load aborted</ino:messageline>
ino:messagelineElapsed time: 11 (s), finished: Fri Jan 03 20:25:07 2003</ino:messageline>
</ino:message>
After investigation, I found out that it depends on the number of characters before the element is encountered. In my example the 73rd entity crossed the limit.
Is there a way to include resolve hundreds of entities?
ISOdia.ent has 14 entities
ISOnum.ent has 74 entities
ISOpub.ent has 84 entities
ISOtech.ent has 61 entities
…
…
ISOamsr.ent has 82 entities
My DTD include all of these entities and I have about 50 thousands files. (and that’s only one project, the lightest).
Can Tamino handle this?
Thanks for your help,
./Malick.
Attached:
Good.xml, a subset of the entities that allows the processing to happen and only 3/50000 entries
Bad.xml, added one entity to good.xml and the process failed
Input.xml, all of the entities that I need to be in the file, containes all ISO* entities used accross all the files and only 3/50000 entries.
usan_standalone.dtd, the DTD w/ all the entities copied within
usan2002_a.xml and usan2002_h.xml, sample data.
input1.xml (3.35 KB)
More attachments
usans.zip (120 KB)
Attached is the correct input file I would like to use, with all the entities referenced.
I get an error:
<?xml version="1.0" encoding="utf-8" ?>
<ino:response xmlns:ino=“http://namespaces.softwareag.com/tamino/response2” xmlns:xql=“XQL FAQ (XML Query Language - Frequently Asked Questions)”>
ino:message
ino:messagelineTamino Data Loader v3.1 Copyright (c) Software AG</ino:messageline>
ino:messagelineLoading from input1.xml to Tamino database USANS</ino:messageline>
ino:messagelineStart: Mon Jan 06 09:23:50 2003</ino:messageline>
</ino:message>
ino:message
ino:messagelineInvalid input file format - ino:request missing</ino:messageline>
</ino:message>
ino:message
ino:messagelineMass load aborted</ino:messageline>
ino:messagelineElapsed time: 16 (s), finished: Mon Jan 06 09:24:06 2003</ino:messageline>
</ino:message>
</ino:response>
input1.xml (14.9 KB)
Hi Malick,
you’ll need at least v3.1.2 Hotfix 3 to get DOCTYPE declarations to work with data loader.
There is a special format for this - you have to wrap the DOCTYPE declaration with:
ino:documentprolog
<![CDATA[
... your DOCTYPE declaration ...
]]>
</ino:documentprolog>
and place this directly after each <ino:object …> that needs this. AFAIK there is currently no way to define it “globally” for all documents in the data loader file.
All the best,
Hermann