SAX vs. DOM - little speed difference?

I’ve been experimenting with the SAX and DOM object models on our database. After doing my best to optimize our code (using OptimizeIt), I find that there’s little speed advantage of SAX vs. DOM.

What I’ve found is that for smaller documents (Nodes numbering in the hundreds), there is no significant speed difference (and in fact, SAX can be slower). For large documents (with thousands of Nodes), SAX is faster by only about 25%.

From my observations, what SAX gains by reducing the number of object allocations (no DOM tree creation), it loses in stack activations. The call stack for the element handler callbacks is extremely deep and consume most of the time spent.

Can anyone comment? I’m rather dissapointed by the payback for the effort of using the SAX object model. The documents we’re storing are quite complex - EDI xml with dtd’s from xcbl.org.

The code I’ve implemented is quite a lot (the objects, elements, and datasources are fairly abstracted from one another), but I’m willing to share if someone thinks they can shed some light.

- steve

What is the average size of these documents? Have you experimented to see how performance varies with different sizes? Where are you reading the sample documents from?

What JVM are you using and which XML parsers?

Kind regards,

Simon

I should have included configuration info, but (considering the forum) I really just wanted to start a discussion of SAX vs. DOM from an architecture point of view.

I’ve done the tests with Sun’s jre 1.3.1 & 1.4.1, fyi - 1.4.1 is much faster with reflection (we’ve mapped XML XPATH to object methods/fields and use the reflection API to set ivars), but otherwise there’s little difference with regards to XML parsing.

All documents are fetched from Tamino using the java API (API4J/3.1.14). The parsers are the Apache ones shipped with Tamino.

And as I said - SAX outperforms DOM when the number of nodes is much greater (for now, I’ll disregard size in terms of kilobytes). Small documents have little difference, and for very small documents (tens of elements) DOM is faster.

I’ve got an Optmizit sampler screen here (it’s a great product, IMO) and see that parsing about 200 XML documents with 8 nodes (so each is about 1/2Kb on the outside) takes 7.5ms with SAX, 4.3 with DOM. This is starting at the stack frame of TSAXInputStreamInterpreter.doInterpret() and TDOMInputStreamInterpreter.doInterpret(), resp. I can verify that the SAX/DOM difference in my own code is insignificant.

And again - to move this discussion back up to the big picture and “best practices”, I’d like to know other peoples experiences with the different APIs. Is anyone else interested in pulling the node values into their program? If so, what are you using? (getNode().getItem().getWhatever() gets tedious and fragile really fast).

Or are most applications simply passing the XML onto a browser or other UI (with perhaps some XSLT)?


Duh - I’m misreading the results - let me restate the numbers:

SAX - 7.5 ms
DOM - 7.8 ms
I neglected to add in the DOM->Object mapping step. Of course, with SAX this is combined in the XML parsing. With DOM, it’s performed with the DOM Element immediately after parsing.

In terms of percent, the difference is insignificant.

- steve

For readers interested in DOM vs. SAX: Here is another interesting discussion.