about transliteration

Dear all,

I would like to ask some question about transliteration. I mainly work on Chinese and my data contains both Simplified and Traditional Chinese characters. I use transliteration to perform a mapping so that if one search for data using a character in Simplified Chinese, the data that contains the word in both Simplified Chinese form and the corresponding Traditional Chinese form are retrieved. I would like to ask which operators can use this mapping of transliteration? I try and find that only the “~=” operator of the X-Query that can take the advantage of this feature. I try the “starts-with”, “between” operator and also the “tf:containsText” operator of XQuery and these operators fail to use the feature. Actually, I would like to find data that starts with some search Chinese word of both forms. Can anyone kindly give some help? Thanks.

Hi,

transliteration affects all operators that perform text search, i.e. not on operations doing substring matching such as starts-with, substring-after etc.
Thus, transliteration affects ~= in X-Query,
and tf:containsText, tf:containsNearText, tf:containsAdjacentText, etc. in XQuery.
Thus, if you find an example where tf:containsText does not honor transliteration, please report this as an error.
The operations based on substring matching, i.e. contains, substring-after, substring-before, starts-with, ends-with can be called with a collation, either by adding the collation explictly to the function call or by specifying a default collation. A collation also can define characters to be equal to each other. I hope this helps.

Regards

Harald

oh, sorry, I think I have made a mistake that “tf:containstext” is not working with the transliteration and I should say sorry about that. And about the language of collation, should I choose “zh”? Is it include both Simplified and Traditional Chinese? I do try “en” before but it seems to cause some problem even for some simple query. Thank you for your help.

Hi,

please make sure to use the correct syntax for collations, e.g.

if (starts-with(“Hugi”,“Hu”,“collation?language=en”)) then 1 else 0

I do not know whether ICU (the underlying collation management software) supports a collation that unifies simplified and traditional chinese out of the box, but as far as I know, you can define your own collations with ICU. Please refer to the ICU user manual for details

Regards

Harald