Another encoding problem

Hello,

I client sends us xml documents encoded in iso-8859-2.
We get these documents on a custom service (we’re not using wm.tn.receive).
Currently, we call the following services :

  • xmlStringToXMLNode (without encoding specified)
  • recognize
  • routeBizdoc

We were testing special characters treatment, and currently, those special characters are stored in TN as follow :
±æê³ó¶ñ¼¿ ¡ÆÊ£ÓѦ¬¯

When I set encoding to iso-8859-2 on the xmlStringToXMLNode service, I obtain the following string :
???ó??? ???Ó???

I tried one more thing, I forced the bizdoc contentPart MimeType to text/xml; charset=iso-8859-2, and I obtained :
¹Ìê³óœùŸ¿ ¥��£��Œ¯

So here I have some polish characters which seem to be OK, but with those Â, Ă, etc… letters which have nothing to do here.
Is this because it is UTF-8 encoded on 2 bytes ?? I can’t understand…

Another thing, I tried bytesToString, stringToBytes with all possible parameters, and I always obtain one of those 3 lines…

so my questions :

  1. What could I do !!?
  2. How can I be sure that the client really sends iso-8859-2 ??
  3. Is the line <?xml version="1.0" encoding="iso-8859-2"?> really useful ???

Thanx !
Regards

Hi,

Well, if you look at the ISO 8859-2 table ( http://en.wikipedia.org/wiki/ISO_8859-2 ) you’ll see that some of these characters are not present there. That’s probably why you get a lot of question marks.

Try using bytesToString / stringToBytes on some existing characters like š and you’ll see that it’s working better. Now, if you try to convert the š character using the Western European ISO 8859-1 format you’ll get a question mark again because š is not available in that particular standard. You could also try using UTF-8 since it contains both Polish and other special characters you might need so you can mix characters like ą and ±.