2314 and 3111, APIs

Hi,

Can anyone explain me these problems:

A, We are using tamino 2.3.1.4 java API (probably we want to upgrade to 3.1.1.1 in the near future). Now, my problem is, that why does not work the older API TaminoClient.Ping() (it throws a SAX exception, misplaced XML…) for the new 3.1.1.1 db?

B, Our application works with Tamino 2.3.1.4 java API. We want to know that how many hits matches for a query. The taminoResult.getTotalCount() does not tell me the total number of hits if I set the TaminoClient page size to 1 (taminoClient.setPageSize(1)), but if I set this parameter to bigger number (>1) it works correctly. I DON’T KNOW: WHY??? It is very confusing me.

C, Schema: Only 60000 documents are in the db and, the search is very slow. I think that I misunderstood something about the schema definition. I need to define all nodes for search?
How can I improve the search (must I define to correct path for nodes)? It is acceptable that 60000 documents too many? What about stopwords?

PS:
-I do not ensure that our document structure will not be modified later!
-To upgrade our source is not a one-day work.

Thanks in advance.

DZSOLT

If you don’t mind me saying so - there are 3 totally different questions in this post which don’t have anything to do with each other.

I would be more convenient if you confined yourself to one posting - one question

Please post the schema and a query that is very slow and I’ll take a look at it

Hi Nigel,

yes, you are right, next time I will separate my questions.

Thanks the answer.
I send the schema definition and some queries to you.

Bye
zsolt
schema_and_queries.txt (7.8 KB)

I have had a quick look.

Take the query

/CompleteDocument[( (BaseDocument/DocUserFields/Type=‘NITF’) or (BaseDocument/DocUserFields/Type=‘ARTICLE’)) and ((DocContent/Document/NITF/HEAD/TITLE~=‘titlestring’) or (DocContent/Document/ARTICLE/HEAD/TITLE~=‘titlestring’)) and (DocContent/Document[(NITF or ARTICLE)]~=‘freetext’)] sortby(@ino:id desc)/BaseDocument

One , I think major, problem with this query is the expressions
BaseDocument/DocUserFields/Type=‘NITF’
and BaseDocument/DocUserFields/Type=‘ARTICLE’))

Type is text indexed according to your TSD2 schema. If you want to do a comparison on this value based on equality you should set the indexing on type to be “standard”. If you really want to find the word “nitf” in the Type text value then use the ~= operator instead of =.

I think this will make quite a differnce

The query is certainly searching outside the schema and indexing definition. The (I presume) NITF part of the schema is not included in the Schema. As a consequence Tamino (2.3) has to search each document individually
to verify DocContent/Document/NITF/HEAD/TITLE~=‘titlestring’
if the left part of the and expression is true.

If this set of documents is small then it doesn’t matter.

In Tamino 3.1 there is a structure index which might help you a bit - It can for instance tell the processor if there is such an element DocContent/Document/NITF/HEAD/TITLE within a document without actually looking at the document.
But it doesn’t know what value TITLE has.

In the second query I would fix
BaseDocument/DocUserFields/Type=‘NITF’)
as described above and move the expression to the left before (DocContent/Document[(NITF)]~=‘Bundeskanzler’) to reduce the set of documents Tamino has to crunch the hard way.

The third query has the same problem as above.

4 and 5 are big involving the whole database . I’ll have to have a look on Monday to see what if anything could be done. Do you set a page size in the query?

Hi Nigel,

About the pageSize: yes, it has 20.

I think that you need more information. The TYPE node can contain only one uppercase word (like NITF, ARTICLE and so on). The TYPE is always present in the query.
The TITLE has been placed at different path in the different TYPE of documents.

1. If I use the correct XPATH (like DocContent/Document/NITF/HEAD/TITLE~=‘onewordintitle’) it will be (I think it must be) faster than DocContent/Document/NITF//TITLE~=‘onewordintitle’?

2. Which index-type can be used in the case of the free text search (searching for a word or, a set of words in the whole document)?

3. The root node is the TYPE of the document. I must combine the TYPE and the FREE text search and other special ‘node’ search. Maybe I can test the query changing the order of components.
Must I exactly define all the nodes for every document in the schema (for index and only for index)??? I have six collections, and I have sixty types. Maybe the next Christmas I will be finished it. Any automatism about this topic??

4. Why can’t I simply put a document to the database (like DocContent/Document)? Why do I need a schema???

PS:
Take the Oracle (8.1.7) interMedia extension, I put a document to a CLOB column. I did not expect other solution from the Oracle, it makes a table from the CLOB column and generates an index on this table (my last info). I have nothing to do, I just have to use the ‘contains’ operator, that’s all, it works.

Best regards,
DZSOLT

Answer to question 4:

You can simply store a document in Tamino, without any schema information. Tamino will process the document and give you correct results at query time.

But if you want Tamino to handle the documents fast, you should prepare it so it knows a bit about the data and the queries and can therefore optimize the storage. This is done with a schema, and actually the only reason why one would need it.

Hiran