nonXML encoding problem

I try to upload nonXML data containing greek characters to tamino. The upload works fine and the content is encoded correctly. Only the office:meta tags like dc:title aren’t encoded correctly. Tamino stores the greek characters like:

I’m slightly confused as to exactly what you are doing.

What are you uploading to Tamino as non-XML and where do the “office:meta” tags come from?

I’m using the schema that is included with the Tamino Non-XML Indexer. So when I’m doing a upload I have the following in tamino:


<office:document-meta xmlns:dc=“” xmlns:meta=“” xmlns:office=“”>
meta:generatorMicrosoft Office Word</meta:generator>

I have found some more information. This should be in the NXE README.

When reading document properties in the NXE server extension 8-bit characters are interpreted according to the platform’s default character set. This is fine as long as the document being stored in Tamino has been written on a platform with the same default character set. However, if you receive a document from another region of the world and want to process it with NXE you can not expect to get the correct indexing information - unless the creator used Unicode, of course.