UTF16 XML encoding in IS 46

I’m having trouble getting an XML document into UTF-16 encoding in IS 4.6. I can set the encoding attribute in the document header to UTF-16 just fine, but when I send the document to another webMethods service via HTTP it gives me the following error:

com.wm.app.b2b.server.ServiceException: com.wm.lang.xml.WMDocumentException: com.wm.util.LocalizedCharConversionException: [B2BCORE.0042.9201] Incorrect character encoding (Missing byte-order mark)

If I change my code to specify UTF-8 as the encoding, it works just fine.

I’ve tried using stringToBytes to encode the XML document to UTF-16, but that gives me this error:

com.wm.app.b2b.server.ServiceException: java.io.UnsupportedEncodingException: UTF-16

What’s the proper way to encode an XML document in UTF-16 and send it over HTTP?

Thanks,
Skip

I believe webMethods supports only for encoding XML UTF-8.

If not Any ideas??

I’m not sure this is a wM thing. Rather I believe it is a JVM thing. Researching…

I think you’re on the right track with your steps of 1) stringToBytes with the UTF-16 encoding; 2) http post the bytes with the encoding header set to UTF-16.

The problem would seem to be to determine why stringToBytes is failing. What JVM are you running? I did a test with stringToBytes using UTF-16 as the encoding and did not encounter an error. I’m running IS 4.6 on the IBM 1.3 JVM.

http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc has the list of encodings that any J2SE v1.3 is required to support. It includes UTF-16.

http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html has additional info.

HTH

Hi,

I believe webMethods is using its own names for the list of encodings and mapping them to the java encoding names. For instance UTF8, instead of UTF-8, is a good choice for the smtp service.

Also, there are two flavors of UTF16, little endian and big endian (i.e. the order of the bytes).

bruno

You need to ensure that you include a BOM on the results from the stringToBytes.

The BOM for little endian Unicode is 0xFFFE and big endian is 0xFEFF.

Thanks for all your replies. Rob, you were right on track with the JVM comment. My server is still running 1.2.2 JVM, which requires an encoding name of “UnicodeBig” (or “UnicodeLittle”, if you want little-endian) instead of “UTF-16”. Once I’d done that, the string was properly encoded by using the stringToBytes function. So I was able to get the sending of an UTF-16 encoded XML file working properly.

I ran into another problem immediately after that - webMethods doesn’t appear to properly recognize incoming UTF-16 encoded XML. I set up a simple service that receives an XML document over HTTP, and was able to receive the document fine when using the standard (UTF-8) encoding. UTF-16 documents, though, came through as blank - they were recognized as XML, but with no elements. Obviously this is useless, so I ended up changing my submission to use Content-Type “text/html” so that the webMethods parser wouldn’t automatically convert to XML, then converting the InputStream myself using the proper encoding. Does that sound like the right approach, or is there a way to make the webMethods XML content-handler recognize a document that uses UTF-16 encoding?

Thanks,
Skip

Eduardo, what exactly is a BOM and how would I specify it? It is something that gets added to the bytes generated by stringToBytes, or is it something specified in the HTTP header somewhere? Or something else entirely? Maybe that would fix my problem with receiving the UTF-16 encoded XML.

Thanks,
Skip

BOM stands for Byte Order Mark. It provides information to the XML parser as to the encoding of the content about to be processed.

So you would add the two bytes at the beginning of the byte generated by stringToBytes.

Ed

It would seem that stringToBytes should add the BOM, otherwise it is not properly UTF-16 encoded, right?

Skip, given that the encoding name “UTF-16” isn’t supported by your JVM directly, can you have the doc submitter specify one of the encoding names that is supported (UnicodeBig or UnicodeLittle)? That should work. If they can’t change the encoding name, then your approach seems reasonable.

Well, I checked the output of stringToBytes, and it is indeed already adding the BOM when I use “UnicodeBig” as the encoding. Just to be safe, I also checked the output of “UnicodeLittle” encoding - both match up to what Eduardo listed as the correct values. So that doesn’t appear to be an issue.

Changing the encoding name is a good suggestion, Rob, but it didn’t help. I submitted a test XML document with “UnicodeBig” as the encoding type with the same result as “UTF-16”. In fact, it doesn’t seem to matter what is in the encoding type attribute of the XML header. I don’t think that webMethods is even able to read the XML header, much less determine the encoding from it. Somewhere in the content handler for XML, the server must be converting the incoming stream to a String. It seems that there just isn’t a check for the UTF-16 encoding when that conversion occurs.

By the way, my local machine (with Sun’s 1.3.1 JVM) also fails to receive the UTF-16 encoded XML properly. So this problem doesn’t appear to be JVM related, or at least not related to 1.2 vs. 1.3. I don’t have an easy way to test 1.4.

Skip

Section 4.3.3 para 2 of the XML spec, states “Entities encoded in UTF-16 must begin with the Byte Order Mark described by Annex F” http://www.w3.org/TR/REC-xml#charencoding

I know the server isn’t converting any stream or bytes to a string before giving it to the webMethods XML parser, but it is possible that an encoding that is incorrect is getting explicitly passed to the parser that would disable the parser looking in the XML prolog.

Changing the HTTP Header to describe the content as “text/html” would tell the parser to default to HTML encoding rules, which are different from those of XML (for example look for a META tag to define the charset, rather than looking for an XML prolog).

What happens if you use the HTTP header
Content-Type=text/xml ; charset=“utf-16”
?

Another wild guess is that names of legal charsets are different in HTTP Content-Type setting, Java APIs like getBytes() and XML prologs. webMethods will try to convert the variances (like UTF-8 and UTF8 and utf-8) appropriately but may be missing conversion of the one of the 16-bit charset names.

How about trying the http content-encoding header?

Fred, the suggestion of adding the charset to the content type did it. I used the following header field:

Content-Type=text/xml;charset=UnicodeBig

My servers then recognized the XML document, under both JVM 1.2 and JVM 1.3. Thanks much for the tip!

I did also try setting the encoding field in the data structure of the pub.client:http service inputs, but that appeared to make no difference.

Thanks,
Skip

I guess the encoding field of the http service is meant to specify whether the data is 7bit or base64 encoded, i.e. the Content-Transfer-Encoding header.

Hi! Skip,

In ur above thread u mentioned that, u converted the content/type to text/html and submitted the content. I am also doing the same thing, i get just the first line of the xml.

Now could you please let me know what did u do to get the inputstream in ur flow service and making it to a standard xml docuement.

regards,

I did savePipeline in my receive function and then used restorePipeline to look at the pipeline coming in. That told me that there was a pipeline variable called contentStream that is an instance of com.wm.app.b2b.server.BoundedInputStream. You can then use streamToBytes and bytesToString to convert it to an XML string. That gives you the proper XML that you can then submit to TN if you want.

Skip,

My environment is 6.0.1 which supposedly supports UTF-16. Can u suggest any similar route for the same. The instance com.wm.app.b2b.server.BoundedInputStream is no more available.

Also, in your input to the flow service, what type of input u mentioned. Was it node or string.

Also, if i go with this route then the partner has to hit my flow service directly. Can i do anything at the Trading Networks end, since my partner would hit the TN first. The XML which i am trying to work on, would come from an external partner. They can send the XML to TN but not to any other flow service directly.

Please suggest the best alternative.

I’m also using 6.0.1 (SP1) so I don’t know what you mean about that class no being available. It works just fine on my server. The input type should be object, although it will show up as a string when you do the save/restore that I suggested.

You might try reading the TN “Building your Network” doc, specifically the section on Gateway services. In my version of the PDF, it’s Appendix B. You may be able to use this functionality to convert the format before TN attempts to process the incoming document.

Hi Floks,

we got an error as [8009]2017-09-25 13:10:58 CEST [ISC.0076.0007W] XMLCoder decode invalid data type: com.wm.net.HttpInputStream

[8008]2017-09-25 13:10:58 CEST [ISP.0090.0004I] I1.imp.p1.services:preOrderResponseService – null into UTF-16 encoding in WM version 9.12. While received plain xml from SAP System. they sending <?xml version="1.0" encoding="UTF-16"?>

Note:- Could you please let us know webmethods support encoding= UTF-16 or not?

when made change encoding=UTF-8.it’s working fine.