Hi All,
I would like to ask if there are any techniques or tricks in using the tf:containText on UTF-8 contents. I have a XML database of about 1GB data and 4GB index contains both Chinese and English. Each XML record varies from 2XXKB to 9XXKB. The database is in UTF-8 encoding.
For english queries, I use query like tf:containsText(title,‘Hong Kong’) for doing the searching. For Chinese, as there is no space between the words, I need to use query like tf:containsText(title,‘??’) that have the asterisk before and after the keyword to perform the query. I use this setup before and it is OK. However, it takes much time difference when the database this time is relatively larger. For the English query, it takes a few seconds. However, it takes over 30 seconds or more for the Chinese queries.
I use the old X-Query and the performance is fine with at most 20 seconds only. I would like to ask if there are any setting I can perform or any thing I can done to improve the performance when using XQuery. I know that there used to be a Chinese tokenizer but it is already discontinued. Thank you for the help.