Can't load Word document

I am newbie to Tamino:) I installed the Non-XML Indexer software and the server-extension and define the schemas but when I tried inserting a Word document I get the following error:

<?xml version="1.0" encoding="windows-1252" ?>
- <ino:response xmlns:ino=“http://namespaces.softwareag.com/tamino/response2” xmlns:xql=“XQL FAQ (XML Query Language - Frequently Asked Questions)”>
- <ino:message ino:returnvalue=“0”>
ino:messagelinedocument processing started</ino:messageline>
</ino:message>
- <ino:message ino:returnvalue=“8711”>
<ino:messagetext ino:code=“INOXPE8711”>Document not well-formed</ino:messagetext>
ino:messagelineLine 1, Column 1: Invalid document structure</ino:messageline>
</ino:message>
</ino:response>


Any clues on what might be going wrong?? I am also attaching the Word document that I tried inserting. Also, once I succeed in inserting the document, would it be possible to add/change the xml-metadata that was captured by the Non-XML Indexer?? If yes, then HOW?

Thanks for all your help.

-Rajeev
Practical.doc (34 KB)

Hi,
When you insert your document you need to specify the collection name and document type (plus a docname if you want to retrieve it by name). For example, using the Interactive Interface, see the screenshot below.
Then you can query the document using a query like this:

http://localhost/tamino/welcome_4_1_4/Non-XML?_XQL=Non-XML-Doctype[properties/content~="simple"]</pre><BR>or by docname like this:<BR><pre class="ip-ubbcode-code-pre">http://localhost/tamino/welcome_4_1_4/Non-XML/Non-XML-Doctype/Practical.doc


Hope this helps.

Thanks!! It worked.

Now how can I modify/add the xml-meta data for this document??

As I understand it, the meta data is extracted from the document (in this case, a Word document) by the server extension plug-in when the document is loaded. So the information you see comes from the document’s Word Properties. If you change the “File,Properties” in Word, save the changed document and then reload it into Tamino, you will effectively modify the meta data.
Does this help?

hmmm…what if the non-xml doc I am trying to insert has some additional xml metadata that I want to associate with it, then how could I achieve that?

On the otherhand if I am inserting a jpeg image most likely the indexer can’t extract any useful metadata out of it so if I want to associate xml metadata with such non-xml content then how should I go about it?

Thanks for all your help!

-Rajeev

Hi Rajeev,

good questions!
additional XML data to a nonXML document can only be written using a server extension. The flow of control is something like:


  • Tamino gets a nonXML document
  • the schema for that document tells Tamino to call the server extension


tsd:shadowXML
tsd:onBinaryInsertSXSBlobIndexer.putBinary</tsd:onBinaryInsert>
tsd:onTextInsertSXSBlobIndexer.putText</tsd:onTextInsert>
</tsd:shadowXML>


  • the extension generates XML and returns it to Tamino
  • Tamino writes nonXML and XML


So you are not able to store additional XML to a nonXML document without using server extension.

NonXML Indexer (NIXE) is such a server extension. It has plugins for several mime types, but currently not for JPGs. Some JPGs have a lot of metadata (generated by digital cameras for example), or you could imagine using OCR software to create content data for scanned documents.

In future it will be possible to write an own plugin for any mime type, but currently you have no chance to write additional XML to Tamino unless you write your own server extension.

Best regards,
Martin