Persistens DOM

Hi,

The main idea is to have a pointer inside Tamino. From hereon you can use DOM commands directly on the structure in Tamino. This will have that advantage when working with large XML documents, not downloading and parsing the entire document each a every time.

Consider the following example (psudo java):
MyPoint = Tamino.getpointer(?http://path/db/coll?_xql=persons[@name~=?j?]?)
System.out.println MyPoint.getNodeName(); //result in xql:result
System.out.println MyPoint.getFirstChild().getNextSibling().getLastChild()?.;

This way you don?t have to retrieve all documents, parsing them etc. This will boost performance on large documents.

Quote from partner:
?Ideally, what we would like is an interface (.Net or COM) to the Tamino server that would enable us to access individual nodes in a Tamino document without retrieving the entire document from the server.

We work with very large documents and we always need to work with the entire document, so extracting xml is not an option.

I know that we can cut up our documents into smaller fragments, but then the advantage of using Tamino instead of SQL Server or Oracle becomes very small.

The perfect solution for us would be an (.Net) XPathNavigator implementation with update extensions.

Alternatively, a subset of the XML DOM could be an option. Something like selectNodes, selectSingleNode, childNodes, etc with node-level updates?

This might be read-only feature, while this might cause problems with transaction and sessions. If we have this feature, it will be easier to develop applications with large documents. Until now our answer to handling large documents, have been ?restructure your XML?. Customers do not want to hear this, because they might be stuck with large XML data from other customers (consider NITF, ebXML, etc., and I have seen even larger XML documents).

Jacob,

I have difficulties understanding exactly what you need. It looks to me like a clash of paradigm:

DOM:
“I want to have full control about the entire document/set of document. I want to be able to navigate and manipulate every single element and attribute which I might use"
Short: give me everything, I decide later what I need.

Database
” I know exactly which part of the document I am interested in, which element, attribute or subdocument. I want to retrieve exactly this and nothing else from the server. The server should handle the Query"
Short: I know what I need, give me just that.

Tamino is currently aimed to the Database paradigm.

The requirements you mention:
Quote:
"we would like is an interface (.Net or COM) to the Tamino server that would enable us to access individual nodes in a Tamino document without retrieving the entire document"

This should be possible with tamino. Just formulate the appropriate Query, tamino will return a XML document containing exactly what you need.

Quote:
"We work with very large documents and we always need to work with the entire document, so extracting xml is not an option."

In this case you might end up demanding every part of the document at different points in time. So where is the difference in reading the whole document at once? Traffic will be the same at the end.

If I interpret your posting correctly, it looks to me that you are looking for some mechanism which allows for a DOM-like client supported by a DOM Server, which takes care of parsing and providing the DOM representation. In short, all functionality of DOM for the client without the ‘ugly’ parts (parsing, navigating and ressources). This is as far as I know not in the scope of tamino. It might be worthwhile discussing it, but I would expect a solution or implementation to be pretty heavy (just one point, this server implementation may need to hold lots of DOM representation for a lot of Users and of probably large sizes - sounds a bit like an Application Server task).

Anyhow, I am curious to responses,

Timm

Hi all,

I think, the major question is understanding how the application works with data and what are the relations between the parts of large documents.

My understanding is that Jacob’s main problem is the lack of node level update. I’d dare to say that most things above will show us the lack of maturity of XML technology or the lack of - you shouldn’t take that personally - experience with XML technology.

Everybody’s argueing that XML offers the “natural” way of structuring data without taking in account that this is pretty nice for single user applications, but has side-effects for multi user environment.

Of course it’s possible to pack all data used in a furniture-retail-system (customers, addresses, offers, products, conditions, users etc.) into one huge document.
And with node level updates it would even be possible to access and change data easily. But portions of data often have relations and constraints on these relations. One area of the document tree might have to be validated against data found another area - well similiar to what we have with relational databases.

So, to keep the data consistent for the application we will need quite a few of locks. In relational databases we usually lock rows, but in hierarchical databases, we lock (sub-)trees.
That can get us into trouble quickly, because it can seriously harm parallelism.

I guess, this is the point where customers will have to decide, if they prefer the ease of definition/structure over speed. My guess is that they will quickly abandon either a schema that is badly designed (for parallel transactions) or will not buy our database, because they say it’s too slow.

Coming back to the start: The question is how the structure, semantics and constraints of the “unchangeable” schemata look like and how the application uses the database. If (which is probably a rare thing) there are only very few and weak relations/constraints, then the node level update will just be enough. Usually the application will not be able to avoid to read (and most probably lock) all subtrees that contain data needed for validation. Even with an application server the only thing that could (sometimes) be avoided, would be the transport to the client.

So, Jacob, it’s up to you to present some usecases/schemata to keep the conversation going… :wink:

GreetinX
Juergen