nonXML encoding problem

Hello,
I try to upload nonXML data containing greek characters to tamino. The upload works fine and the content is encoded correctly. Only the office:meta tags like dc:title aren’t encoded correctly. Tamino stores the greek characters like:

I’m slightly confused as to exactly what you are doing.

What are you uploading to Tamino as non-XML and where do the “office:meta” tags come from?

Hi,
I’m using the schema that is included with the Tamino Non-XML Indexer. So when I’m doing a upload I have the following in tamino:


-
data…

<office:document-meta xmlns:dc=“DCMI: DCMI Metadata Terms” xmlns:meta=“http://openoffice.org/2000/meta” xmlns:office=“http://openoffice.org/2000/office”>
office:meta
meta:generatorMicrosoft Office Word</meta:generator>
dc:title

I have found some more information. This should be in the NXE README.

When reading document properties in the NXE server extension 8-bit characters are interpreted according to the platform’s default character set. This is fine as long as the document being stored in Tamino has been written on a platform with the same default character set. However, if you receive a document from another region of the world and want to process it with NXE you can not expect to get the correct indexing information - unless the creator used Unicode, of course.