UTF-8 question

Hi All,

I’m planning to switch from x-application to passthru servlet, but I still can’t solve some problems. One of them is I can’t search out the desired documents from db. The default encoding of my db is set to UTF-8, my stylesheet of displaying the result has the following snapshot,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet … (omitted)>
<xsl:output method=“html” omit-xml-declaration=“yes” indent=“yes” encoding=“UTF-8”/>
<xsl:strip-space elements=""/>

which, at the top, will also display the query box together with the entered query string (if any). So, in the beginning, the page will only contains a query box. When I enter query like "/article[title ~= ‘XX’], no documents have been found. I then investigate into the query string, which like

http://localhost/servlet/TaminoFilter/tamino/mydb/mycollection?_xql=%2Farticle%5Btitle+%7E%3D+%27
%E7%B3%BB%E6%9C%83*%27%5D&_xslsrc=browse.xsl

I then copy the _xql and query without passthru by

<A HREF="http://localhost/tamino/mydb/mycollection?_xql=“same” TARGET=_blank>http://localhost/tamino/mydb/mycollection?_xql="same as above"
No record (the ino:query have strange characters)

<A HREF=“http://localhost/tamino/mydb/mycollection?_xql=“same”&_encoding=utf-8” TARGET=_blank>http://localhost/tamino/mydb/mycollection?_xql=“same”&_encoding=utf-8
Correctly match my documents (the ino:query is beautiful)

By the above try, I suppose it’s due to some character encoding problem. I believe the problem can be solve if I can somehow tell the passthru about encoding, but the documentation stated that I can’t.

I further try to erase encoding=“UTF-8” in <xsl:output … />, my god! it works. But my entire html source become &#xxxxxx. And I can’t display the entered query in a text element within a form, which is encoded to &#xxxxxx since disable-output-escaping=“yes” is not allowed inside xsl:value-of within xsl:attribute.

Does anyone have similar experience in passthru with UTF-8? Can anyone point out what’s wrong I have made?

Thanks and regards,
Lun

Could I start by asking which version of Passthru you are using? Is it the new one (4.1.1)?

Michael Kay

Hi Michael,

I don’t know what version of Passthru I’m using, when I follow the config of passthru, the test page appear, but it said somewhat "passthru run out of jar, can’t detemine version … "

I havn’t update/upgrade any patch after installing Tamino 3.1.2.1

Thanks!

Thanks. You’re using the old version, it seems (the new one has only been available for a couple of weeks, and you need to download it from the Download area of the Developer Community - see the menu on the left).

Moving to the new version won’t necessarily solve the problems, but it may be worth doing since you’re only just starting to use Passthru. The new version has a number of improvements: it’s better at managing information in HTTP headers, and it gives you a free choice of XSLT processors.

Internally, we had a long discussion about character encoding problems with Passthru in May this year, and I’m not sure we ever got to the bottom of discovering exactly what works and what doesn’t: there are many variables involved.

The %Hxx encoding that you observe in the expanded query URL looks correct. Special characters (non-ASCII characters, and reserved characters such as “=” and “[”) are supposed to be encoded in hex when a URL is transmitted over HTTP.

The _encoding parameter in the URL affects the encoding of the results of a query when the query is sent directly to Tamino. But I think that it is ignored when the query is sent via the passthru servlet. Tamino will then return the document using its default encoding, which is iso-8859-1. This shouldn’t normally be a problem because the XSLT processor will recognize it as iso-8859-1 and process it correctly; it’s only a problem if the document contains XML element or attribute names that can’t be encoded in iso-8859-1.

The encoding property in the xsl:output declaration in your stylesheet affects the encoding of the HTML output produced by the stylesheet. This is unrelated to the encoding of the input to the stylesheet (that is, the XML document returned by Tamino). It should be possible to specify any encoding here that you browser recognizes. For modern browsers, the default encoding of UTF-8 is fine. The only reason for using a different encoding is if you want to look at the generated HTML using a text editor that doesn’t understand UTF-8. Perhaps this is the problem you were alluding to when you wrote “but then my entire HTML source becomes ‘xxxxx’”.

I’m not sure about the problems of entering a new query in a text box and using disable-output-escaping. Perhaps we should tackle that one separately when we’ve sorted out getting the query results displayed correctly.

I know I haven’t solved the problem for you here, but I hope I’ve given you some information that will point you in the right direction. If you still have problems, please try to describe exactly what you are doing, and exactly what output you are getting. In particular, with encoding issues, try to distinguish clearly between what you see displayed on the screen and the actual bytes that are present in a message or file.

Michael Kay

Hi Michael,

Thank you for your detail explanation first, but I really can’t get out from lost. In fact, what I am exactly doing is trying to switch from using x-application to passthru, to make a search, list, view page accordingly. But I’m having problem in character encoding.

I’d like to know if my approach is correct. Firstly, I have a jsp page that display a search box (basically just enter the xql). Inside the jsp, I have page directive of contentType set to “text/html;charset=utf-8”, I have html meta content-type set to “text/html;charset=utf-8”, i.e.,
<@page contentType="text/html;charset=“UTF-8”%>

I’m using Tomcat 3.3a final, I have . I just follow exactly the same way the jsp pages generated by x-app.

So, I enter a query which contains chinese characters and then submit to the TaminoFilter, using xsl to transform to the list page. But the expected match documents didn’t appear, i.e. ino:message XQL Request processed, no object returned.

I’m pretty sure it’s related to how the xql is encoded as the _xql parameter to TaminoFilter. With try and error, I discovered that if I change the charset in my jsp page, it works. i.e., change to
<@page contentType="text/html;charset=“iso-8859-1”%>

But obviously, I can’t display chinese characters in my search page.

I hope that you can still understand my problem despite of my poor english. All I want is just an working example of passthru, that can search out utf-8 characters. Or some clear direction of what charset I have to use in my search page and DecodeInterceptor in tomcat. Or even don’t use jsp for the search page?

By “Tamino will then return the document using its default encoding, which is iso-8859-1.”, do you mean my per database setting of “default encoding”? If not, where can I change it to utf-8?

Many many thanks for your patience!

Best regards,
Lun

OK, I did see something after I change to passthru 4 (the latest version) this morning when I try to see the tamino responseWhen I use UTF-8 for charset in jsp page, if I submit to

http://localhost/passthru4/servlet/TaminoFilter/tamino/mydb/mycln?_xql%281%2C10%29=%2Farticle%5B%28title+%7E%3D+%27*%E9%A6%99%E6%B8%AF*%27%29%5D

<?xml version="1.0" encoding="utf-8" ?>
- <ino:response xmlns:ino=“http://namespaces.softwareag.com/tamino/response2” xmlns:xql=“http://metalab.unc.edu/xql/”>
xql:query/article[(title ~= ‘香港’)]</xql:query>
- <ino:message ino:returnvalue=“0”>
ino:messagelineXQL Request processed, no object returned</ino:messageline>
</ino:message>
</ino:response>

If on the other hand, I submit to

http://localhost/passthru4/servlet/TaminoFilter/tamino/mydb/mycln?_xql%281%2C10%29=%2Farticle%5B%28title+%7E%3D+%27*%E9%A6%99%E6%B8%AF*%27%29%5D&_xslsrc=http://localhost/passthru4/xsl/browse.xsl

<?xml version="1.0" encoding="UTF-8" ?>
- <ino:response xmlns:ino=“http://namespaces.softwareag.com/tamino/response2” xmlns:xql=“http://metalab.unc.edu/xql/”>
xql:query/article[(title ~= '*

1 Like