Hello,
what kind of structure does Tamino store non-XML object in DB ? tree ? or file ?
Can I build index on those non-XML object ?
How does Tamino organize index ? Does Tamino optimize index for query ?
Regards
Hello,
what kind of structure does Tamino store non-XML object in DB ? tree ? or file ?
Can I build index on those non-XML object ?
How does Tamino organize index ? Does Tamino optimize index for query ?
Regards
Currently Tamino doesn’t build indexes on non XML objects. You can only query by ino:id or document
name.
What sort of non XML would you like to index?
Thanks,Nigel
I wonder if Tamino can not build index on non XML object,how can Tamino speed up query operation? Especially if a XML document contains those objects.
By the way,could you tell me the physical storage structure of non XML object.
Currently Software AG don’t describe how non XML is stored
But essentially you can think of it as a blob.
What I do to index HTML is as follows,
The tool that loads the nonXML reads any HTML extracts meta data and content words and creates meta data . Both metadata and non XML are stored.
The meta data references the documents.
The application queries the meta data and get Tamino URLs. The application uses these URLs.
The meta data implemenation is based on RDF and implements the Dublin Core document meta data standard.
The method is also applicable to other non XML formats - you just need to build a component that extracts the data for each document type.
The indexing method is uniform for all possible metadata vocabularies so there is no fiddling about with Tamino Schemas.
You can extend the meta data for instances manually - for instance if it is missing for an instance or if the NonXML document has no meta data.
This should also be implementable as a server extension but I haven’t done that yet.
If you want a copy of the implemenation just ask
Hi Nigel,
As I can see, Tamino’s search engine is based on Dublin Core and RDF implementation. I’ve been studying how can I create an application which brings me nixe documents based on semantics.
I wonder if is necessary to create any additional schema into Tamino to do that because nixe has every information we use to query (description, subject, etc). My doubt is how can I use loadlists and stoplists to store every word in “doctype/properties/content” element. Should I set something on Tamino or I have to create an application to do this kind of indexing stuff?
Thanks in advance, Ito
Hi Ito,
please note that Nigel’s answer to this thread dates from December 01.
Things have changed in the meantime. You are mentioning two different
topics in your post.
Loadlists and stoplists are terms from the realm of text retrieval. You can define loadlists in Tamino to declare certain terms as crucial to
your application and thus enhance querying using text retrival. This is documented with Tamino 4.2 and works for any Tamino collection, nixe or
other. Tamino 4.2 does not document the concept of stoplists though if
you are in need of those, we can do something for you.
The concept of nixe is different. Tamino offers the possibility to store
non-XML documents accompanied by so-called shadow documents. These are
XML documents that allow better querying. The query results are than excerpts of the shadow document. This information can then be used to
obtain the original data requested. The user must provide both a schema for the shadow documents and an SXS function that maps the non-XML data into an XML instance adhering to that schema. The nixe project now offers such schema-SXS pairings for certain well-known non-XML data formats as
Word, PDF, etc. With Tamino 4.2 the nixe sources are packed with Tamino.
Regards,
Juliane.