Error when trying to handle & character in XML message in webMethods 10.3

Mikael_Lund · March 2, 2022, 6:50am

Hi

I need to encode special characters like < and > and & and ’ and " in a webMethods service. To test it I have made this simple xml document:

<?xml version="1.0" encoding="utf-8"?>
<main>
<element> here with less than < greater than >       ampersant  &  character simplequote  '  dobblequote "</element>
</main>

which I assign to a string variable.

Then I let that string variable be input to the flow service

pub.xml:xmlStringToXMLNode setting encoding = utf-8 and isXML = true

the resulting node is then input to this service

pub.xml:xmlNodeToDocument where I map the output document to a document Document type.

However in this last step I get this error:

Launch started: 2022-03-02 07:33:34.242
Configuration name: encoding (1)
Configuration location: C:/Users/milun/workspace103/.metadata/.plugins/org.eclipse.debug.core/.launches/encoding (1).launch
 
com.wm.app.b2b.server.ServiceException: [ISC.0042.9325] Element <element> is missing end tag
	at pub.xml.xmlNodeToDocument(xml.java:1037)

Funny thing is, that if I omit the & character from the small XML String value I created so it look like this:

<?xml version="1.0" encoding="utf-8"?>
<main>
<element> here with less than < greater than >       ampersant    character simplequote  '  dobblequote "</element>
</main>

then it works.

Can anyone tell me why and also what to do about it?

Kind regards Mikael

Holger_von_Thomsen · March 2, 2022, 9:34am

Hi Mikael,

your sample xml is not welformed.

Can you elaborate your use case a bit more in detail please?
At certain points there is an automatic encoding/decoding in place.

Regards,
Holger

Mikael_Lund · March 2, 2022, 10:04am

Hi Holger

the challenge is, that I receive a whole xml message in a String variable. That xml message may contain characters like & and < and >. I currently send this string value to an external party who then runs into trouble when he wants to change the string “xml” into an xml document due to the special characters. Therefore, I would like to encode those characters so that they do not cause trouble when trying to “convert” the string xml to a real xml message.

My test xml sample is this:

<?xml version="1.0" encoding="utf-8"?>

<main>
<element> here with less than < greater than > ampersant & character simplequote ’ dobblequote "</element>
</main>

I hope this makes it a bit more clear.

Br Mikael

John_Carter4 · March 2, 2022, 10:45am

You could use the service ‘pub.string:URLEncode’ to encode the string before mapping it into your document and then call documentToXmlNode. We also have ‘pub.string:base64Encode’ if you want to use that encoding istead.

However, the recipient will need to know what type of encoding you used.
regards,
John.

reamon · March 2, 2022, 4:17pm

Set encode to true when calling this. That will encode any characters in any of the fields that need to be encoded. E.g. & to & and < to < Unless you know to 100% certainty that all values in a particular document will never have such characters, encode should always be set to true.

reamon · March 2, 2022, 4:33pm

Proceed with caution if one uses this. The rules for URL encoding differ from escaping markup characters. Setting encode to true in the call to documentToXMLString is the way to go.

Side note: the doc for the encode input describes what is done and refers to it as “HTML encoding” but that isn’t quite accurate either as HTML encoding rules again are slightly different from XML encoding.

Holger_von_Thomsen · March 2, 2022, 5:04pm

Hi Mikael,

just some other questions which come up here:
From where resp. how are you retrieving the xml string variable?
How do you send the converted xml document to the external party?

Answering these questions might help us to determine further steps to be checked.

Regards,
Holger

John_Carter4 · March 2, 2022, 5:15pm

Good catch @reamon, I forgot about the ‘encode’ attribute in documentToXmlString. That’s a much better candidate than mine.
thanks,
John.

reamon · March 2, 2022, 6:36pm

Replies about using encode in documentToXMLString aside, getting back to the original behavior when using the string from @Mikael_Lund …

As @Holger_von_Thomsen noted, this is not well-formed XML Nor is it valid XML – the & character cannot be there. It would need to be & But xmlStringToXMLNode parses it with no complaints.

However, proceeding to the next step, calling xmlNodeToDocument fails with

Malformed entity reference: & character simplequote ' dobblequote "

As expected. But this differs from the error you encountered. Is there a different string you used that generated the “missing end tag” error?

If you’re starting with an XML string for processing, it must be valid and should be well-formed before it is passed to xmlStringToNode, etc…

reamon · March 2, 2022, 8:22pm

Thanks @toni.petrov for editing the original post to expose the markup properly. That’s a much different scenario.

But the issue is the same. The XML is malformed. It cannot have a plain & in the element value. That must be & for it to be valid and processed correctly. One should not try to URLEncode that string (or search and replace, etc.) to replace the & in that specific string. Doing that is not a good approach and for URLEncode it will not do what you want.

To emphasize an earlier note: if you’re going to start with an XML string, that string must be valid.

Mikael_Lund · March 3, 2022, 8:02am

Hi Holger

thanks for your reply. I receive the xml document as one big string value and I pass it on as such. However, sometimes the troublesome character & is present in that string. So one option would be to encode the whole string value but I don’t know if this is the way to go and also if this is the correct way to approach this. (Maybe first do a string replace of &ampersantsemicolon to & and then a string replace of & to &ampersantsemicolon to not mess up any correctly encoded &s ?).

Another option would be to convert the string into a known document (first pub.xml:xmlStringToXMLNode and then pub.xml:xmlNodeToDocument) and then do a replace of the & in the one tag where the problem lies. My challenge is though, that I don’t know of a smart way to do a replace on the value of this one tag without having to map each field in the whole document (any suggestions would be highly appreciated).

Kind regards Mikael

Holger_von_Thomsen · March 3, 2022, 9:35am

Hi Mikael,

can you check how the xml string is generated before it is sended to you?

Eventually the sender needs to do some adjustments in the generation so that you will receive only correctly encoded xml strings then.

Regards,
Holger

reamon · March 3, 2022, 5:40pm

This means the system that is generating the XML has a bug. It is not valid to have an unescaped & in a value in XML. There is nothing reliable that can be done on the system that receives this XML to accommodate or correct it.

This will not work. Because the XML is invalid and the parser will not be able to accurately parse it.

There is no way. The source must fix their error. The recipient cannot do anything to fix it.

reamon · March 3, 2022, 6:09pm

At this page is the specification for XML: Extensible Markup Language (XML) 1.0 (Fifth Edition)

In section 2.4 it states:

The ampersand character (&) and the left angle bracket (<) *MUST NOT* appear in their literal form, except when used as markup delimiters, or within a [comment](https://www.w3.org/TR/xml/#dt-comment), a [processing instruction](https://www.w3.org/TR/xml/#dt-pi), or a [CDATA section](https://www.w3.org/TR/xml/#dt-cdsection). If they are needed elsewhere, they *MUST* be [escaped](https://www.w3.org/TR/xml/#dt-escape) using either [numeric character references](https://www.w3.org/TR/xml/#dt-charref) or the strings " &" and "< " respectively

The reason this is a constraint is because it is impossible to reliably parse XML when a & literal is in element content. You might be able to get lucky, depending upon the specific XML being used, of doing a search/replace but there is a good probability that will fail you at some point in the future. The source system must fix their error.

Side note: try to avoid CDATA sections as a “work-around”. They can be somewhat painful to deal with and are almost never necessary.

system · June 1, 2022, 6:09pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
xml encoding Tamino	4	5776	April 2, 2021
Substring Functions and & Tamino	3	3000	April 2, 2021
Problem with Tamino	6	6139	April 2, 2021
EDI Delimiter as part of the data EDI	19	5351	April 2, 2021
What to do when Tamino rejects documents with & and & auml; Tamino	1	4971	April 2, 2021

Error when trying to handle & character in XML message in webMethods 10.3

Related topics