Converting ISO88591 to ASCII text

kevinmccj · January 17, 2006, 4:03am

Hi,

Is it possible to convert any XML received with some special characters (western european charset/iso-8859-1) into normal ASCII text??
I did some R&D and found that we need to use pub.string.bytesToString or pub.string.stringToBytes, I am I on the right track?? Can any one provide me sample implementationï¿½

Thanks in advance
Kevin

reamon · January 17, 2006, 8:42am

Do a Google search of iso-8859-1. You’re sure to find some interesting information.

kevinmccj · January 17, 2006, 8:11pm

Rob,

Thanks for your suggestion. I already did some R&D on that…I want to explore the out of the box features of webMethods to convert these special chars

Thanks
Kevin

reamon · January 18, 2006, 6:56pm

Since 8859-1 is a superset of ASCII, I’m not sure what you mean by “convert these special chars.” If you’re referring to character encoding conversion, like 8859-1 to UTF-8 (the default encoding used by IS), then you’re on the right track with bytesToString and stringToBytes. Keep in mind that XML posted to IS is automatically processed, depending on the mechanism used. The XML declaration needs to properly identify the encoding used.

littlebird · February 1, 2006, 4:09am

Hi,

I’m also having problems with special characters. I receive an XML file containing characters like “é” and “à” but I’m not able to see them. Once I’ve done wm.tn.doc.xml:bizdicToRecord, here is the kind of data I have.

Bloc quadrillÃƒÂƒÃ‚Â© 5 carrÃƒÂƒÃ‚Â©s/pouce, lettre

It suppose to be.

Bloc quadrillé 5 carrés/pouce, lettre

There is <?xml version="1.0" encoding="iso-8859-1"?> in the header of the file. I try to use pub.string.bytesToString with the encoding at “ISO-8859-1” but I have the same issue. What can I do to solve this?

Littlebird

Chris_L · February 1, 2006, 9:43pm

Try setting the encoding to UTF-8 and see if that helps.

littlebird · February 2, 2006, 2:05am

The text still looking like this.

Bloc quadrillÃƒÂ© 5 carrÃƒÂ©s/pouce, lettre

I used pub.string.bytesToString

gsr_sreedhar · February 7, 2006, 1:26pm

What is the database encoding being used, change that and try

littlebird · February 15, 2006, 9:06pm

Hi gsr_sreedhar,

Where can I see that? I don’t know what you are talking about.

Michael_Deng · February 16, 2006, 12:27am

Hi littlebird,
I got the xml document from TN, didn’t see the issue you described.

wm.tn.doc.xml:bizdocToRecord
pub.xml.documentToXMLString
pub.flow:debugLog
In the log file, I could see ‘Bloc quadrillé 5 carrés/pouce, lettre’

Here are the steps I sent document to TN

pub.xml:xmlStringToXMLNode — input string is ‘<?xml version='1.0' encoding='iso-8859-1'?> Bloc quadrillé 5 carrés/pouce, lettre’
wm.tn.doc.xml:routeXml

littlebird · February 18, 2006, 2:15am

Hi,

I continue my research to try to find the problem.

When this customer is submitting us is order is doing it by http to wm.tn:receive

Here is the contain of wm.tn:receive

 wm.tn.doc:recognize
 wm.tn:submit
 wm.tn.admin:clean

I’ve made a copy of this flow and add an Exit Flow at the end. I also add a savepipeline and an exit at the beginning of the flow that is suppose to be executed with the processing rule. Once done, I submit the order to the flow “receive” (my copy). I look into TN and the Document Type, the Sender and the receiver were to “unknown” but the content contain the accents. No pipeline file was created for the second flow (with the processing rule)

The next step was to remove the exit from the flow “receive”. I submit the same order. In TN, the document type, the sender and the receiver were correct but the content do not include the accents (ex: “é” was replace by “Ã©”). A pipeline file was created.

Questions: What append between the end of “receive” and the beginning of the flow (with the processing rule)? What can change the accents into the content of this order in TN?

Notes: For some other customer there is no problem with the accents.

Here is the beginning of the order.

<?xml version="1.0" encoding="iso-8859-1"?> EDI_DC40 100 0000000043761107 46C 30 1 2

reamon · February 18, 2006, 4:12am

What encoding is the “other customer” specifying in their XML header?

littlebird · February 20, 2006, 10:44pm

Hi,

Yes the encoding is into the header. Like this one below.

<?xml version="1.0" encoding="UTF-8"?>

reamon · February 23, 2006, 3:33am

Take one of the XML docs from the customer that is failing and change the encoding to UTF-8 to see if that is the proper encoding to use. If it works, then ask the customer to change their header when sending the doc.

littlebird · February 23, 2006, 8:07pm

Hi reamon,

The test you suggest me is working.

littlebird · February 24, 2006, 1:22am

Except the solution that the customer is changing is encoding, can I do something to correct that situation on my side?

reamon · February 25, 2006, 4:33am

No. If it turns out that it is indeed caused by the incorrect encoding attribute, then the customer is sending bad data. It would be like them sending ASCII data but them indicating that they are sending EBCDIC. It’s wrong for them to do that. The proper thing (and in this case, the only thing) to do is to correct that error.

littlebird · March 1, 2006, 2:34am

Thank you for your help.

uchava.14139 · March 2, 2006, 3:56am

Can you hex dump the string at the source and target.?

uchava.14139 · March 2, 2006, 3:57am

Can you hex dump the string both at the source and target.?

Topic		Replies	Views
Challenge in Encoding UNEDIFACT data in iso-8859-1 format EDI	3	1745	April 2, 2021
encoding Tamino	2	6314	April 2, 2021
default encoding Tamino	1	7624	April 2, 2021
regarding international characters in EDI data EDI	2	1580	April 2, 2021
Using ISO-8859-1 in Tamino 3.1 Tamino	3	3654	April 2, 2021

Converting ISO88591 to ASCII text

Related topics