Search across multiple document types

I need to create a search enginge for an intranet site where the content is held in many document types in Tamino.

The user wants to be able to enter search criteria and to see a list of all qualifying documents. To achieve this I need to do one of the following:

  • construct a query that will retrieve documents regardless of document type.
  • separate searches for each document type and then consolidate/sort results (this is not desirable due to probable performance issues with large result sets).
  • Maintain separate indexing documents containing searchable content and document type and key of the related document.

Can anybody out there in Tamino list help with advice, warnings or other possible solutions?

Any help would be greatly appreciated

Hello David.

I think that your goal is achievable with Tamino, and that the main thing to keep in mind is “design for performance”.

If you have multiple XML document types stored in a Tamino collection, you can query across all of them quite simply.
For example, with Tamino X-Query:

   /*[//name="John"]</pre>Or, with Tamino XQuery4:<BR><pre class="ip-ubbcode-code-pre">   for $a in input()/*
   where $a//name="John"
   return $a</pre>These queries will both return any documents containing an element called "name" with the value "John".  One caveat is, of course, that the documents have to have an element called "name".<BR>In order to find documents where *any* element has the value "John", you could do this:<BR><pre class="ip-ubbcode-code-pre">   /*[//*="John"]

(Or an equivalent XQuery4 version.)
These queries will not be terribly performant though - using “//” to query is already expensive, and combining that with “*” only adds to the cost.

One last - very important - point on this: XQuery4 does not support non-XML documents yet. If you execute an XQuery4 expression (like the one shown) on a collection which contains non-XML documents, an error will occur. If you need to store non-XML documents, store them in a separate collection to avoid this.

Depending on the context of your search engine, it may be more appropriate to execute separate searches (rather than “one big one”). You probably don’t want to return 10,000 matches to the user in a single step - I would imagine that they get the matches in sets of 10/25/50 or such.
If this is the model employed, you could execute a specific query to get the best 10/25/50 matches, then execute another query to get the next best matches while the user is looking over the first set, and so on.

I’m not sure if it will exactly match your requirements, but you could have a look at the Tamino Non-XML Indexer to see if it meets your needs for maintaining separate index documents. (It can be downloaded here: Tamino Non-XML Indexer)

This component creates an XML representation of a non-XML document, indexes this representation and then links the index and the non-XML document. A query will make use of the index and return the non-XML document.
There is also a specific discussion forum for it: Tamino Non-XML Indexer

I hope that this is helpful,