indexing

We’d like to be able to quickly query text within large-ish number of documents (100 - 200K). Our documents look like this:


<root>
	<header>
		...
	</header>
	<body>
		<section>
			<text>Some text we want to query</text>
		</section>
		<section>
			<text>More text we want to query</text>
		</section>
		...
	</body>
</root>

In the schema for these docs I’ve specified a “text” index for the “text” elements and “standard” indices from the “section” element all the way up the tree to “root” with no improvement in query time - or much growth in the index size (via SMH), which makes me a little suspicious.

Any suggestions? Thanks in advance for any help.

Hi “TaminoD”,
Sounds odd !
Have you tried the DisplayIndex function to determine what has been indexed? (See below)

If you don’t get any wiser perhaps you could attach the schema and a sample document ?

Finn

from the documentation:

Syntax
The function call has the following syntax:

_=ino:DisplayIndex(“CollectionName”, “ElementPath”, “StartValue”, “Size”, “IndexType”)
where CollectionName is the name of the collection, ElementPath is the absolute path of the indexed element, StartValue is the first index value that you want to display, Size is the number of values to display for the index (must be a positive integer), and IndexType specifies the index type, which can be “standard” or “text”.

Example
_=ino:DisplayIndex(“Customers”,“Customer/Name”,“B”,“10”,“standard”)

Thanks for the tip. Unfortunately, I can’t get the admin command to work. I’m GETing (i.e. X-machine command in a browser) a URL like

https://host.edu/tamino/db/collection/{?_admin=ino:DisplayIndex(“root”,“root”,“B”,“10”,“standard”)}

I’m a little confused about the “StartValue” parameter, but regardless, the command isn’t working; I get back (in my browser) error 8320, “Error parsing the XQL query”. Any ideas what I might be doing wrong? Malformed URL? Or is another tool appopriate for issuing this command?

Thanks for any help with this.

Sorry, another closely related indexing question: could the problem be my failure to fill in the “refers”, “multiPath”, and “field-xpaths” fields for the indices? If so, what might be appropriate values, or where in the documentation might I find them?

Thanks again!

Arrgh, one more complication. The sample I posted was necessarily a simplification of the real thing we’re storing. I have actually made such docs with indices as indicated and our queries generally perform well enough - much better than without indexing, anyway. But the reality of our actual documents differs a bit in that what corresponds to the “section” element in the example above is recursive in that sections contain sections, so that, for instance, a document might contain:


<root>
  ...
  <body>
    <section>
      <section>
        <text>Some text we want to query</text>
      </section>
      <section>
        <text>Some more text needing an index</text>
      </section>
    </section>
  </body>
</root>

Perhaps the "section"elements need more than the standard index?

Arrgh, one more complication. The sample I posted was necessarily a simplification of the real thing we’re storing. I have actually made such docs with indices as indicated and our queries generally perform well enough - much better than without indexing, anyway. But the reality of our actual documents differs a bit in that what corresponds to the “section” element in the example above is recursive in that sections contain sections, so that, for instance, a document might contain:


<root>
  ...
  <body>
    <section>
      <section>
        <text>Some text we want to query</text>
      </section>
      <section>
        <text>Some more text needing an index</text>
      </section>
    </section>
  </body>
</root>

Perhaps the "section"elements need more than the standard index?

Why did you put in curly bracket ?

My best bet for a DisplayIndex command would be

https://host.edu/tamino/db?_admin=ino:DisplayIndex(“root”,“root/body/section/section/text”,“A”,“10”,“text”)

That is in case your collection is named “root” ?
BTW in general you should be carefull using naming that use reserved words like “root”, “text”

Another question… What do you mean by
“… “standard” indices from the “section” element all the way up …” ?

  • that you have given the individual elements “standard” index or have you put standard index on the “section” group ?

Finn

Sorry to be so slow in responding - this project is one I can only attend to once a week or so. Anyway, I got the X-Machine command to work; the reason I was using curly brackets is that that was what was indicated in the documentation, and though they’re not meant to be literals, I took them as such.

The result is, it looks like, that the index is working on the first level, but not on nested/recursive elements, i.e. in


<document>
  <section>...</section>
  ...
</document>

“section” elements are indexed, but nested sections are not:


<document>
  <section>
    <section>...</section>
    ...
  </section>
  ...
</document>

In the latter case, only the outer section elements are indexed. So my question finally boils down to how do I specify such a recursive index?

As far as my “all the way up” comment, I was just trying to indicate that all the elements to the root had been indexed so that query paths wouldn’t include non-indexed elements.

Thanks again for your help with this, and any further suggestions.