Another encoding problem

tschmalz · March 22, 2006, 8:24pm

Hello,

I client sends us xml documents encoded in iso-8859-2.
We get these documents on a custom service (we’re not using wm.tn.receive).
Currently, we call the following services :

xmlStringToXMLNode (without encoding specified)
recognize
routeBizdoc

We were testing special characters treatment, and currently, those special characters are stored in TN as follow :
±æê³ó¶ñ¼¿ ¡ÆÊ£ÓÑ¦¬¯

When I set encoding to iso-8859-2 on the xmlStringToXMLNode service, I obtain the following string :
???ó??? ???Ó???

I tried one more thing, I forced the bizdoc contentPart MimeType to text/xml; charset=iso-8859-2, and I obtained :
ÂąĂŚĂŞÂłĂłÂśĂąÂźÂż ÂĄĂ�Ă�ÂŁĂ�Ă�ÂŚÂŹÂŻ

So here I have some polish characters which seem to be OK, but with those Â, Ă, etc… letters which have nothing to do here.
Is this because it is UTF-8 encoded on 2 bytes ?? I can’t understand…

Another thing, I tried bytesToString, stringToBytes with all possible parameters, and I always obtain one of those 3 lines…

so my questions :

What could I do !!?
How can I be sure that the client really sends iso-8859-2 ??
Is the line <?xml version="1.0" encoding="iso-8859-2"?> really useful ???

Thanx !
Regards

Sergej_Gratchev · March 28, 2006, 11:33am

Hi,

Well, if you look at the ISO 8859-2 table ( http://en.wikipedia.org/wiki/ISO_8859-2 ) you’ll see that some of these characters are not present there. That’s probably why you get a lot of question marks.

Try using bytesToString / stringToBytes on some existing characters like š and you’ll see that it’s working better. Now, if you try to convert the š character using the Western European ISO 8859-1 format you’ll get a question mark again because š is not available in that particular standard. You could also try using UTF-8 since it contains both Polish and other special characters you might need so you can mix characters like ą and ±.

Topic		Replies	Views
Challenge in Encoding UNEDIFACT data in iso-8859-1 format EDI	3	1743	April 2, 2021
Extra Segment delimiters EDI	6	1464	April 2, 2021
Superscript character as field delimiter in EDI EDI	15	2936	April 2, 2021
TAMINO (sorting, searching) - polish fonts Tamino	11	7677	April 2, 2021
problems with characterset (o umlaut etc) EntireX	2	5606	April 2, 2021

Another encoding problem

Related topics