We are storing non-XML documents using the Java API in Tamini 188.8.131.52. We store them using the iso-8859-1 encoding:
TNonXMLObject document = TNonXMLObject.newInstance(
new java.io.StringReader(nonXmlModel.getContent())) ; document.setDocname(nonXmlModel.getDocName());
document.setDoctype(nonXmlModel.getDocType()) ; document.setContentType(“text/html; charset=iso-8859-1”);
However when we fetch the documents using Java, then we have to use the UTF-8 encoding to get the right format:
ByteArrayOutputStream baos = new ByteArrayOutputStream(); nonXmlObject.writeTo(baos); nonXmlModel = new NonXMLModel(); nonXmlModel.setContent(baos.toString(“UTF-8”));
If we make the query in the browser with the UFT-encoding, then the results is in the wrong format:
but a query with the iso-encoding is correct:
Is there a bug in the Java API?
Many regards from
InterResearch A/S - Trekronergade 126 F - 2500 Valby
Phone: +45 70 27 28 72 - Mobile: +45 20 10 21 74 - Fax: +45 36 14 80 20
E-mail: email@example.com - Homepage: http://www.interresearch.net/
we are currently looking into this. Is the assumption right that you are storing HTML pages in this application?
Ok, I got an answer on this. Here is the scoop:
Any document, regardless wether it is XML or nonXML, which “smells” like an encoded text, will be converted by Tamino to it’s internal encoding format. When you get the document out of Tamino it will be encoded according the default encoding (a Tamino parameter), which is by default “iso-8859-1”, unless the “accept-charset” in the HTTP-header is specified. The Tamino API for Java sets the “accept-charset” to “UTF-8”, because the is the natural thing to do, if you want to convert it to Java Strings anyway. This is why you get it back in “UTF-8” and you have to use “UTF-8” to convert it e.g. to a String.
Is this a problem for you?
[This message was edited by Christian Gengenbach on 20 Dec 2002 at 10:46.]