Question about using tf:containsText function on Chinese content

KF_Low · September 16, 2013, 6:50am

Hi All,

 I would like to ask if there are any techniques or tricks in using the tf:containText on UTF-8 contents. I have a XML database of about 1GB data and 4GB index contains both Chinese and English. Each XML record varies from 2XXKB to 9XXKB. The database is in UTF-8 encoding.

For english queries, I use query like tf:containsText(title,‘Hong Kong’) for doing the searching. For Chinese, as there is no space between the words, I need to use query like tf:containsText(title,‘??’) that have the asterisk before and after the keyword to perform the query. I use this setup before and it is OK. However, it takes much time difference when the database this time is relatively larger. For the English query, it takes a few seconds. However, it takes over 30 seconds or more for the Chinese queries.

I use the old X-Query and the performance is fine with at most 20 seconds only. I would like to ask if there are any setting I can perform or any thing I can done to improve the performance when using XQuery. I know that there used to be a Chinese tokenizer but it is already discontinued. Thank you for the help.

Topic		Replies	Views
[XQUERY] prolem xquery response too long Tamino	2	6154	April 2, 2021
full-text indexing and searching Tamino	1	3873	April 2, 2021
Can we search with Chinese Characters? Tamino	6	3638	April 2, 2021
about transliteration Tamino	4	6341	April 2, 2021
Problem to get correct search results with tf:containsText() Tamino	4	6406	April 2, 2021

Question about using tf:containsText function on Chinese content

Related topics