Problems with encoding non-XML documents using Java API

HC_Hammerstoft · December 17, 2002, 4:47pm

Hi,

We are storing non-XML documents using the Java API in Tamini 3.1.2.1. We store them using the iso-8859-1 encoding:
TNonXMLObject document = TNonXMLObject.newInstance(
new java.io.StringReader(nonXmlModel.getContent())) ; document.setDocname(nonXmlModel.getDocName());
document.setDoctype(nonXmlModel.getDocType()) ; document.setContentType(“text/html; charset=iso-8859-1”);

However when we fetch the documents using Java, then we have to use the UTF-8 encoding to get the right format:
ByteArrayOutputStream baos = new ByteArrayOutputStream(); nonXmlObject.writeTo(baos); nonXmlModel = new NonXMLModel(); nonXmlModel.setContent(baos.toString(“UTF-8”));

If we make the query in the browser with the UFT-encoding, then the results is in the wrong format:
http://server01/tamino/defgo/reports/html/@6?_encoding=UTF-8
but a query with the iso-encoding is correct:
http://server01/tamino/defgo/reports/html/@6?_encoding=iso-8859-1

Is there a bug in the Java API?

Many regards from

Ole
Manager

InterResearch A/S - Trekronergade 126 F - 2500 Valby
Phone: +45 70 27 28 72 - Mobile: +45 20 10 21 74 - Fax: +45 36 14 80 20
E-mail: oen@interresearch.dk - Homepage: http://www.interresearch.net/

Christian_Gengenbach · December 18, 2002, 4:11pm

Hi,

we are currently looking into this. Is the assumption right that you are storing HTML pages in this application?

Cheers,
Christian Gengenbach.

Christian_Gengenbach · December 20, 2002, 1:07am

Ok, I got an answer on this. Here is the scoop:
Any document, regardless wether it is XML or nonXML, which “smells” like an encoded text, will be converted by Tamino to it’s internal encoding format. When you get the document out of Tamino it will be encoded according the default encoding (a Tamino parameter), which is by default “iso-8859-1”, unless the “accept-charset” in the HTTP-header is specified. The Tamino API for Java sets the “accept-charset” to “UTF-8”, because the is the natural thing to do, if you want to convert it to Java Strings anyway. This is why you get it back in “UTF-8” and you have to use “UTF-8” to convert it e.g. to a String.
Is this a problem for you?

[This message was edited by Christian Gengenbach on 20 Dec 2002 at 10:46.]

Topic		Replies	Views
Using ISO-8859-1 in Tamino 3.1 Tamino	3	3654	April 2, 2021
encoding Tamino	2	6314	April 2, 2021
encoding of <![CDATA[ ... ]]> Tamino	6	4914	April 2, 2021
Introduction of non-ASCII characters on Tamino Mobile applic Tamino	2	244	April 2, 2021
xml encoding Tamino	4	5776	April 2, 2021

Problems with encoding non-XML documents using Java API

Related topics