UTF16 XML encoding in IS 46

Guest · May 20, 2003, 9:50pm

I’m having trouble getting an XML document into UTF-16 encoding in IS 4.6. I can set the encoding attribute in the document header to UTF-16 just fine, but when I send the document to another webMethods service via HTTP it gives me the following error:

com.wm.app.b2b.server.ServiceException: com.wm.lang.xml.WMDocumentException: com.wm.util.LocalizedCharConversionException: [B2BCORE.0042.9201] Incorrect character encoding (Missing byte-order mark)

If I change my code to specify UTF-8 as the encoding, it works just fine.

I’ve tried using stringToBytes to encode the XML document to UTF-16, but that gives me this error:

com.wm.app.b2b.server.ServiceException: java.io.UnsupportedEncodingException: UTF-16

What’s the proper way to encode an XML document in UTF-16 and send it over HTTP?

Thanks,
Skip

Guest · May 21, 2003, 2:37am

I believe webMethods supports only for encoding XML UTF-8.

If not Any ideas??

reamon · May 21, 2003, 3:17am

I’m not sure this is a wM thing. Rather I believe it is a JVM thing. Researching…

reamon · May 21, 2003, 3:35am

I think you’re on the right track with your steps of 1) stringToBytes with the UTF-16 encoding; 2) http post the bytes with the encoding header set to UTF-16.

The problem would seem to be to determine why stringToBytes is failing. What JVM are you running? I did a test with stringToBytes using UTF-16 as the encoding and did not encounter an error. I’m running IS 4.6 on the IBM 1.3 JVM.

[url=“http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc”]http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc[/url] has the list of encodings that any J2SE v1.3 is required to support. It includes UTF-16.

[url=“JDK 19 Documentation - Home”]JDK 19 Documentation - Home has additional info.

HTH

begebege · May 21, 2003, 11:28am

Hi,

I believe webMethods is using its own names for the list of encodings and mapping them to the java encoding names. For instance UTF8, instead of UTF-8, is a good choice for the smtp service.

Also, there are two flavors of UTF16, little endian and big endian (i.e. the order of the bytes).

bruno

eduardo.saltelli.8599 · May 21, 2003, 7:16pm

You need to ensure that you include a BOM on the results from the stringToBytes.

The BOM for little endian Unicode is 0xFFFE and big endian is 0xFEFF.

Guest · May 21, 2003, 7:17pm

Thanks for all your replies. Rob, you were right on track with the JVM comment. My server is still running 1.2.2 JVM, which requires an encoding name of “UnicodeBig” (or “UnicodeLittle”, if you want little-endian) instead of “UTF-16”. Once I’d done that, the string was properly encoded by using the stringToBytes function. So I was able to get the sending of an UTF-16 encoded XML file working properly.

I ran into another problem immediately after that - webMethods doesn’t appear to properly recognize incoming UTF-16 encoded XML. I set up a simple service that receives an XML document over HTTP, and was able to receive the document fine when using the standard (UTF-8) encoding. UTF-16 documents, though, came through as blank - they were recognized as XML, but with no elements. Obviously this is useless, so I ended up changing my submission to use Content-Type “text/html” so that the webMethods parser wouldn’t automatically convert to XML, then converting the InputStream myself using the proper encoding. Does that sound like the right approach, or is there a way to make the webMethods XML content-handler recognize a document that uses UTF-16 encoding?

Thanks,
Skip

Guest · May 21, 2003, 7:20pm

Eduardo, what exactly is a BOM and how would I specify it? It is something that gets added to the bytes generated by stringToBytes, or is it something specified in the HTTP header somewhere? Or something else entirely? Maybe that would fix my problem with receiving the UTF-16 encoded XML.

Thanks,
Skip

eduardo.saltelli.8599 · May 21, 2003, 8:05pm

BOM stands for Byte Order Mark. It provides information to the XML parser as to the encoding of the content about to be processed.

So you would add the two bytes at the beginning of the byte generated by stringToBytes.

Ed

reamon · May 21, 2003, 8:48pm

It would seem that stringToBytes should add the BOM, otherwise it is not properly UTF-16 encoded, right?

Skip, given that the encoding name “UTF-16” isn’t supported by your JVM directly, can you have the doc submitter specify one of the encoding names that is supported (UnicodeBig or UnicodeLittle)? That should work. If they can’t change the encoding name, then your approach seems reasonable.

Guest · May 21, 2003, 9:19pm

Well, I checked the output of stringToBytes, and it is indeed already adding the BOM when I use “UnicodeBig” as the encoding. Just to be safe, I also checked the output of “UnicodeLittle” encoding - both match up to what Eduardo listed as the correct values. So that doesn’t appear to be an issue.

Changing the encoding name is a good suggestion, Rob, but it didn’t help. I submitted a test XML document with “UnicodeBig” as the encoding type with the same result as “UTF-16”. In fact, it doesn’t seem to matter what is in the encoding type attribute of the XML header. I don’t think that webMethods is even able to read the XML header, much less determine the encoding from it. Somewhere in the content handler for XML, the server must be converting the incoming stream to a String. It seems that there just isn’t a check for the UTF-16 encoding when that conversion occurs.

By the way, my local machine (with Sun’s 1.3.1 JVM) also fails to receive the UTF-16 encoded XML properly. So this problem doesn’t appear to be JVM related, or at least not related to 1.2 vs. 1.3. I don’t have an easy way to test 1.4.

Skip

fred.hartman.5916 · May 21, 2003, 10:15pm

Section 4.3.3 para 2 of the XML spec, states “Entities encoded in UTF-16 must begin with the Byte Order Mark described by Annex F” [url=“Extensible Markup Language (XML) 1.0 (Fifth Edition)”]Extensible Markup Language (XML) 1.0 (Fifth Edition)

I know the server isn’t converting any stream or bytes to a string before giving it to the webMethods XML parser, but it is possible that an encoding that is incorrect is getting explicitly passed to the parser that would disable the parser looking in the XML prolog.

Changing the HTTP Header to describe the content as “text/html” would tell the parser to default to HTML encoding rules, which are different from those of XML (for example look for a META tag to define the charset, rather than looking for an XML prolog).

What happens if you use the HTTP header
Content-Type=text/xml ; charset=“utf-16”
?

Another wild guess is that names of legal charsets are different in HTTP Content-Type setting, Java APIs like getBytes() and XML prologs. webMethods will try to convert the variances (like UTF-8 and UTF8 and utf-8) appropriately but may be missing conversion of the one of the 16-bit charset names.

reamon · May 21, 2003, 10:45pm

How about trying the http content-encoding header?

Guest · May 22, 2003, 4:58am

Fred, the suggestion of adding the charset to the content type did it. I used the following header field:

Content-Type=text/xml;charset=UnicodeBig

My servers then recognized the XML document, under both JVM 1.2 and JVM 1.3. Thanks much for the tip!

I did also try setting the encoding field in the data structure of the pub.client:http service inputs, but that appeared to make no difference.

Thanks,
Skip

begebege · May 22, 2003, 12:27pm

I guess the encoding field of the http service is meant to specify whether the data is 7bit or base64 encoded, i.e. the Content-Transfer-Encoding header.

khazanchi · August 7, 2003, 5:40am

Hi! Skip,

In ur above thread u mentioned that, u converted the content/type to text/html and submitted the content. I am also doing the same thing, i get just the first line of the xml.

Now could you please let me know what did u do to get the inputstream in ur flow service and making it to a standard xml docuement.

regards,

Guest · August 7, 2003, 8:08pm

I did savePipeline in my receive function and then used restorePipeline to look at the pipeline coming in. That told me that there was a pipeline variable called contentStream that is an instance of com.wm.app.b2b.server.BoundedInputStream. You can then use streamToBytes and bytesToString to convert it to an XML string. That gives you the proper XML that you can then submit to TN if you want.

khazanchi · August 7, 2003, 11:06pm

Skip,

My environment is 6.0.1 which supposedly supports UTF-16. Can u suggest any similar route for the same. The instance com.wm.app.b2b.server.BoundedInputStream is no more available.

Also, in your input to the flow service, what type of input u mentioned. Was it node or string.

Also, if i go with this route then the partner has to hit my flow service directly. Can i do anything at the Trading Networks end, since my partner would hit the TN first. The XML which i am trying to work on, would come from an external partner. They can send the XML to TN but not to any other flow service directly.

Please suggest the best alternative.

Guest · August 11, 2003, 9:50pm

I’m also using 6.0.1 (SP1) so I don’t know what you mean about that class no being available. It works just fine on my server. The input type should be object, although it will show up as a string when you do the save/restore that I suggested.

You might try reading the TN “Building your Network” doc, specifically the section on Gateway services. In my version of the PDF, it’s Appendix B. You may be able to use this functionality to convert the format before TN attempts to process the incoming document.

varadharajulareddy_vallapureddy · September 26, 2017, 10:49am

Hi Floks,

we got an error as [8009]2017-09-25 13:10:58 CEST [ISC.0076.0007W] XMLCoder decode invalid data type: com.wm.net.HttpInputStream

[8008]2017-09-25 13:10:58 CEST [ISP.0090.0004I] I1.imp.p1.services:preOrderResponseService – null into UTF-16 encoding in WM version 9.12. While received plain xml from SAP System. they sending <?xml version="1.0" encoding="UTF-16"?>

Note:- Could you please let us know webMethods support encoding= UTF-16 or not?

when made change encoding=UTF-8.it’s working fine.

Topic		Replies	Views
Superscript character as field delimiter in EDI EDI	15	2936	April 2, 2021
Challenge in Encoding UNEDIFACT data in iso-8859-1 format EDI	3	1739	April 2, 2021
WmEDIINTsend error EDI	8	1823	April 2, 2021
How can send and receive a flatfile by AS2 EDIINT? EDI	12	3564	April 2, 2021
How to send XML with pub.clent:http ? EDI	16	7244	April 2, 2021

UTF16 XML encoding in IS 46

Related topics