Converting ISO88591 to ASCII text

Hi,

Is it possible to convert any XML received with some special characters (western european charset/iso-8859-1) into normal ASCII text??
I did some R&D and found that we need to use pub.string.bytesToString or pub.string.stringToBytes, I am I on the right track?? Can any one provide me sample implementation�

Thanks in advance
Kevin

Do a Google search of iso-8859-1. You’re sure to find some interesting information.

Rob,

Thanks for your suggestion. I already did some R&D on that…I want to explore the out of the box features of webMethods to convert these special chars

Thanks
Kevin

Since 8859-1 is a superset of ASCII, I’m not sure what you mean by “convert these special chars.” If you’re referring to character encoding conversion, like 8859-1 to UTF-8 (the default encoding used by IS), then you’re on the right track with bytesToString and stringToBytes. Keep in mind that XML posted to IS is automatically processed, depending on the mechanism used. The XML declaration needs to properly identify the encoding used.

Hi,

I’m also having problems with special characters. I receive an XML file containing characters like “é” and “à” but I’m not able to see them. Once I’ve done wm.tn.doc.xml:bizdicToRecord, here is the kind of data I have.

Bloc quadrillé 5 carrés/pouce, lettre

It suppose to be.

Bloc quadrillé 5 carrés/pouce, lettre

There is <?xml version="1.0" encoding="iso-8859-1"?> in the header of the file. I try to use pub.string.bytesToString with the encoding at “ISO-8859-1” but I have the same issue. What can I do to solve this?

Littlebird

Try setting the encoding to UTF-8 and see if that helps.

The text still looking like this.

Bloc quadrillé 5 carrés/pouce, lettre

I used pub.string.bytesToString

What is the database encoding being used, change that and try

Hi gsr_sreedhar,

Where can I see that? I don’t know what you are talking about.

Hi littlebird,
I got the xml document from TN, didn’t see the issue you described.

wm.tn.doc.xml:bizdocToRecord
pub.xml.documentToXMLString
pub.flow:debugLog
In the log file, I could see ‘Bloc quadrillé 5 carrés/pouce, lettre’

Here are the steps I sent document to TN

pub.xml:xmlStringToXMLNode — input string is ‘<?xml version='1.0' encoding='iso-8859-1'?> Bloc quadrillé 5 carrés/pouce, lettre’
wm.tn.doc.xml:routeXml

Hi,

I continue my research to try to find the problem.

When this customer is submitting us is order is doing it by http to wm.tn:receive

Here is the contain of wm.tn:receive

 wm.tn.doc:recognize
 wm.tn:submit
 wm.tn.admin:clean

I’ve made a copy of this flow and add an Exit Flow at the end. I also add a savepipeline and an exit at the beginning of the flow that is suppose to be executed with the processing rule. Once done, I submit the order to the flow “receive” (my copy). I look into TN and the Document Type, the Sender and the receiver were to “unknown” but the content contain the accents. No pipeline file was created for the second flow (with the processing rule)

The next step was to remove the exit from the flow “receive”. I submit the same order. In TN, the document type, the sender and the receiver were correct but the content do not include the accents (ex: “é” was replace by “é”). A pipeline file was created.

Questions: What append between the end of “receive” and the beginning of the flow (with the processing rule)? What can change the accents into the content of this order in TN?

Notes: For some other customer there is no problem with the accents.

Here is the beginning of the order.

<?xml version="1.0" encoding="iso-8859-1"?> EDI_DC40 100 0000000043761107 46C 30 1 2

What encoding is the “other customer” specifying in their XML header?

Hi,

Yes the encoding is into the header. Like this one below.

<?xml version="1.0" encoding="UTF-8"?>

Take one of the XML docs from the customer that is failing and change the encoding to UTF-8 to see if that is the proper encoding to use. If it works, then ask the customer to change their header when sending the doc.

Hi reamon,

The test you suggest me is working.

Except the solution that the customer is changing is encoding, can I do something to correct that situation on my side?

No. If it turns out that it is indeed caused by the incorrect encoding attribute, then the customer is sending bad data. It would be like them sending ASCII data but them indicating that they are sending EBCDIC. It’s wrong for them to do that. The proper thing (and in this case, the only thing) to do is to correct that error.

Thank you for your help.

Can you hex dump the string at the source and target.?

Can you hex dump the string both at the source and target.?