I try to upload nonXML data containing greek characters to tamino. The upload works fine and the content is encoded correctly. Only the office:meta tags like dc:title aren’t encoded correctly. Tamino stores the greek characters like:
I’m slightly confused as to exactly what you are doing.
What are you uploading to Tamino as non-XML and where do the “office:meta” tags come from?
I’m using the schema that is included with the Tamino Non-XML Indexer. So when I’m doing a upload I have the following in tamino:
<office:document-meta xmlns:dc=“http://purl.org/dc/elements/1.1/” xmlns:meta=“http://openoffice.org/2000/meta” xmlns:office=“http://openoffice.org/2000/office”>
meta:generatorMicrosoft Office Word</meta:generator>
I have found some more information. This should be in the NXE README.
When reading document properties in the NXE server extension 8-bit characters are interpreted according to the platform’s default character set. This is fine as long as the document being stored in Tamino has been written on a platform with the same default character set. However, if you receive a document from another region of the world and want to process it with NXE you can not expect to get the correct indexing information - unless the creator used Unicode, of course.