Any versus non declared content.

Dear collegues

We would appreciate any advise, recommendation on the next … options.

We have a very complex XML records format (around three thousands different nodes), and we decided not to define the schema completely but just some concrete elements. Those defined in the schema are the indexes, and the rest is stored in an Any node. I am foressing if it would be better from some point of view, to use non declared content instead.

In this second case, we could apply the structureIndex feature to Full, what will provide speed on recovering records that have concrete elements non declared in the schema, but I think this will increase a lot the complexity on indexes for the own XML server and probably make updating too slow.

Records are 5k, with half of the size devoted to such Any or non declared content.

Any comment would be appreciated. Regards.
Ignacio

Hello Ignacio,

as far as I understand your scenario there are two topics which
are more or less independant of each other:
(1) writing a schema that not fully describes the document
structure
(2) defining indexes for such a schema, in particular
structure index = full

ad (1):
There are basically three possibilities to allow arbitrary
content:
- using xs:any allows sub-trees with arbitrary content at the
point where the xs:any is defined,
- open content (doctype option) allows arbitrary content for
each element that has a complexType definition
- defining an element without a type allows arbitrary content
for this element

Which one to choose depends on how exact you can describe your
document structure, my feeling for your scenario is that xs:any
is the best choice.


ad (2):
structure index = full can be used in any of the scenarios
described above. The question however is whether it will be of
any use at all. You say that you already have defined indexes
for the known parts of the document. structure index = full
will help only if there are queries against the arbitrary
contents, and even then it will support only the so-called
existence tests.

Example:
Assume there is an undeclared element named “abc” which is
used in the following query:
/Document[abc = “value”]
With structure index=full this query is first evaluated as if
it was
/Document[abc], i.e. find all documents where the element
“abc” exists, and this set of documents is then post-processed with the
real query in order to find the documents where abc has the
requested value.

Hope this helps,
Best regards,
Manfred Michels

Hi Manfred

Thanks for your reply, and yes, it helps.

In the context we work, it has sense to know about the existence of an element not declared, I mean, we would like to perform the query /Document[abc = “value”] where is not declared. In fact, the expected query will be /Document[abc], so just to know about the existence of the element.

My doubt was if structure index=full applies to Any nodes or just to undeclared content, or both, what you clarified me.

The next point is if there is any recomendation from performance point of view about using Any, non declared content or, as you remember, elements with no type, within this scenario. The possible set of non dclared elements are around 2500, althoug around 50 per document, that is:











Thanks again.
Ignacio