While attempting to parse XML file embedded with
Chinese characters I have difficulties using
stringToDocument to produce a node that is
recognized by recordToDocument. Any idea?
The steps I used to parse the XML are�
getFile
bytesToString
stringToDocument
getDocumentType
The best bet is to do the following sequence:
getFile (as stream)
stringToDocument (passing stream)
getDocumentType
Unless you pass the encoding to the bytesToString service, the default encoding of the VM is used for interpreting the bytes. This is most likely the issue that you’re having.
You’ll need to ensure that your XML document has the proper XML prologue encoding set e.g.
<?xml version="1.0" encoding="UTF-16"?>
Thanks. Big5 and GB2312 are not supported. I wrote
a utility to convert Big5 encoded XML to Unicode XML and now I can parse the XML fine.