Canbt query an html page with charset%3dunicode using pubwebquerydocument

Hello !

I’m doing a simple web flow in B2B 4.0 and I’ve run in the following issue :
I have a flow service with “loadDocument” and “QueryDocument”. I can’t query following html :
"

" It seems that the "unicode" is causing some problems. The error that I get is : Could not obtain the Document View from the Server ... com.wm.util.LocalizedCharConversionException Incorrect character encoding (Missing byte-order mark)

I’ve tried to specify the encoding in the loadDocument parameters (like UTF-16 ), but no good result.

Any ideeas ?

Best regards,
Marius

Two guesses:

  1. Although the HTTP headers are good and the content is UTF-16, charset=unicode doesn’t define whether the data is big endian or little endian, so without a BOM the parser doesn’t know how to parse the data.
    [url=“FAQ - UTF-8, UTF-16, UTF-32 & BOM”]http://www.unicode.org/faq/utf_bom.html[/url]#22

  2. Check the HTTP headers using pub.flow:getTranportInfo. The web server may be doing something funky with encoding. Then look at the characters in the stream (get with pub.client:http and look at the bytes to see that they are really unicode).

Thanks for the response !

I don’t have any control of how the html is build. I need to find a way to load it and parse it using flow services.
I’ve tried on webMethods version 3.0 and it works fine. I assume that 3.0 just ignore any unicode specifier.
The IE is able to render it.