Wellformed xml check in IS

Tong_Wang · September 30, 2005, 1:45am

Our partner is sending us a xml string with some special character at the begining, although it’s not failing in IS ( seems IS just can parse it to a node, even it’s not valid xml), but it’s failing in other system which using schema to do the parsing.

I like to know if there is a service in IS to do just basic xml validation to prevent non-well-formed xml enter the system.
A sample of this failed doc like this:
??<order> …</order>

Ramesh_Sambandan · September 30, 2005, 1:48am

Tong Wang,

You can use WmPublic/pub.schema:validate to validate the xml.

ramesh.

Tong_Wang · September 30, 2005, 1:55am

I don’t want to do full schema validation, only want to check if it’s well formed/valid xml.

Ramesh_Sambandan · September 30, 2005, 2:13am

Tong Wang,

Invoke pub.xml:xmlStringToXMLNode and try to give an invalid xml string and set isXML to true, and it automatically throws the error.

ramesh.

Tong_Wang · September 30, 2005, 7:32pm

Ramesh, for case the extra character is outside of the root tag like the sample I gave, pub.xml:xmlStringToXMLNode won’t fail, it still will generate the node.
Thanks for your comments anyway.

gupta_r.17495 · September 30, 2005, 7:55pm

But XMLNodeToDocument will fail,did u tried this?

HTH,
RMG

gupta_r.17495 · September 30, 2005, 9:03pm

But as long as the xml comes with <?xml> it will not fail and still parses the structure.

Ramesh_Sambandan · September 30, 2005, 9:03pm

Tong Wang,

queryXMLNode will throw error.

ramesh.

Tong_Wang · September 30, 2005, 9:23pm

Ramesh, I tested with queryXMLNode, it still not failing.
I think once the node is generated, it already trim off the error characters.
Thanks,

Guest · October 2, 2005, 8:25pm

if its always coming with the same extra characters, why not try using string replace and taking those chars out…

wMusers.Com1 · October 3, 2005, 5:38pm

I don’t think that it is unreasonable to expect a partner to send you well-formed XML. My first course of action would be to work with them to correct the root cause, before jumping through a lot of hoops to fix their problem for them.

Since you have stated that you can create a valid IS XML node from the string they are sending, why not just convert that node back into a string? The extra characters should be gone now, right?

Mark

Tong_Wang · October 3, 2005, 7:59pm

Thanks, Mark,
That make sense, but, we want to flag the error so we can contact the partner to let them fix the problem. Although, not many partners are sending these kind of extra characters, we want a generic solution to handle any new cases in the future.
Manju, replacing is not acceptable, we may replace the same character in other part of the payload.
Thanks anyway.

Seems, we don’t have a xml check service in IS, I may look at Java resources to do that.

wMusers.Com1 · October 3, 2005, 11:09pm

In this case your special characters actually appear outside the root node of the document. I believe, this is what allows IS to create a valid node rather than to throw an exception. If you have malformed XML anywhere inside the root node, you get an exception every time. You can test this by removing or misspelling an end tag.

Given IS’ correct handling of malformed XML inside the root node of a document and your partner’s inclusion of extraneous characters outside of the root node, one workaround would be to create a substring of the xml string from the characters up to, but not including the “<?xml>”. If this string is not empty, then you can reject the document as invalid.

Mark

reamon · October 3, 2005, 11:19pm

pub.xml:xmlStringToXMLNode and XMLNodeToDocument are indeed the services to enforce well-formed XML (or allow non well-formed). xmlStringToXMLNode does some basic validation (e.g. matching tags) but it allows data leading up to the prolog/first tag and ignores it. In other cases, docs that are not well-formed generate errors.

I’ve seen threads that describe various parsers that ignore whitespace in front of the prolog, something that is supposedly against XML rules but I can’t find anything that says that explicitly. The parser in Integration Server clearly allows any amount of junk prior to the prolog. It will even successfully parse with leading junk in front of the first tag when no prolog is present. Very forgiving.

IMO, Mark’s suggestion to just strip the leading junk using built-in services is the path of least resistance. If you really, really need to detect leading junk, detecting that the first character is a ‘<’ should be sufficient–xmlStringToXMLNode and xmlNodeToDocument should be able to do the rest. Or you could swap out the XML parser and use one that detects leading junk but that’s most likely more hassle than it’s worth.

Tong_Wang · October 4, 2005, 1:42am

Thanks, Mark & Rob.
I think I will use the approach of detecting first character ==‘<’ or not. This will have least performance impact which is critical for us.

Thanks to everyone who tried to help.

Tong

Topic		Replies	Views
EDI Delimiter as part of the data EDI	19	5351	April 2, 2021
Edi 210 EDI	61	9338	April 2, 2021
Invalid Character Should Result in Reject Transaction EDI	7	1220	April 2, 2021
Invalid white space character (0x7) in text to output EntireX	6	5506	April 2, 2021

Wellformed xml check in IS

Related topics