TAMINO (sorting, searching) - polish fonts

Hello!

I have problem with sorting and searching objects (records) in my database XML Tamino with using polish fonts (?, ?, ?, ?, ?, ?, ?,

Hello:

I store xml with latin fonts:

<?xml version="1.0" encoding="ISO-8859-1" ?>

You should specify your enconding.

If you have the record with search-type text and search by ~="

I have a encoding line in XML TAMINO DataBase:

<?xml version="1.0" encoding="iso-8859-1" ?>

So I can ONLY store records with polish fonts but I cannot SEARCH/SORT this database. I use this command:

<A HREF="http://wwwpocz/tamino/db005/UnixTeam?_xql=Pracownik[Nazwisko=“Paw?owski” TARGET=_blank>http://wwwpocz/tamino/db005/UnixTeam?_xql=Pracownik[Nazwisko=“Paw?owski”] (?)

in polish version of Internet Explorer 5.0 and I get no records (I should have got one record)! :-((

Dariusz Baumann

Hi,

Using Tamino Interactive Interface it is possible to query:

Pracownik[Nazwisko~=“Paw?owski”]

and get the correct result providing the correct encoding is set to iso-8859-2. If the query is done from IE5 by specifying it in the URL then I suspect this is not going to work because I believe the URL is not Unicode encoded.

Can you query using TII? The other thing to consider is indexing: ‘standard’ required for sorting and ‘text’ required for text retrieval (as in the example above) - or you could use both.

A few points (not an answer, I’m afraid, but I hope it helps anyway!)

Firstly, terminology: we’re talking about character sets, not about fonts. Arial and Helvetica are different fonts, but they just change the visual appearance of the characters. A “Polish font” handles a different set of characters - when you’re searching, that’s a vital distinction.

Secondly: character coding in XML. XML basically handles the Unicode character set, but XML documents can use various encodings of this character set. For Polish you could use iso-8859-2 (the Eastern European characters) or UTF-8 (full Unicode, in compressed encoding), for example. Using iso-8859-1 sounds like a bad idea, because that only handles Western European characters. Saying they are iso-8859-1 characters if they aren’t is going to get you into trouble, although the software won’t be able to detect the difference.

Thirdly: encoding of special characters in URLs. URLs, unlike XML, aren’t fully international. There are conventions for encoding non-ASCII characters in a URL, (as %HH%HH, giving the hex values of the UTF-8 representation), but these aren’t universally applied, and I’ve no idea if this would work in a Tamino query. Can anyone else shed light on this?

1)URLs in IE 5.0
<A HREF="http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko=“Paw?owski” TARGET=_blank>http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko=“Paw?owski”] - it doesn’t work!
<A HREF="http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko~=“Paw?owski” TARGET=_blank>http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko~=“Paw?owski”] - it doesn’t work!

2) Tamino Interactive Interface 2.2.1.2
<A HREF="http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko=“Paw?owski” TARGET=_blank>http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko=“Paw?owski”] - it works! :-))
<A HREF="http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko~=“Paw?owski” TARGET=_blank>http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko~=“Paw?owski”] - it doesn’t work!

3) JBuilder 4.0
<A HREF="http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko=“Paw?owski” TARGET=_blank>http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko=“Paw?owski”] - it doesn’t works!
<A HREF="http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko~=“Paw?owski” TARGET=_blank>http://wwwpocz/tamino/db005/UnixTeam?xql=Pracownik[Nazwisko~=“Paw?owski”] - it works! :-))

I don’t understand why? Please explain me. Thanks!
In JBuilder procedure setCharset(“iso-8859-2”) it didn’t cause any effects.

Does anybody use JBuilder to connect to database XML TAMINO ???

Dariusz Baumann
POLAND

PS. Tamino Interactive Interface
When query doesn’t have polish fonts I can search by “=” or “~=” but when query does have polish fonts I can ONLY search by “=” !! Why? Anybody knows?

In the case where I got the query :

Pracownik[Nazwisko~=“Paw?owski”]

to work I was using Tamino v2.3. Here the TII has an additional field called encoding that I set to iso-8859-2. The default encoding for a database can be set to this value in the event that the browser does not send the required encoding to Tamino (use Tamino Manager - select database, database properties).

I don’t think this query will work by just specifying a URL, e.g http://localhost/tamino… because each URL is considered a string of one-byte characters which means problems with characters such as “?”.

I suspect your different query results are as a result of encoding problems; however the version you have (2.2.1.2) is quite old and v2.3 has many improvements.

Hello!

I have installed newer version of Tamino 2.2.1.9 which in TII has additional field called encoding. Only when I used encoding=“UTF-8” I have found expected results. When I tried to change encoding=“ISO-8859-2” I haven’t found anything!

For the programmer the best way to connect/modify/insert/delete is JAVA (not TII!) I am using JBuilder with JDK1.3 and I have got all necessary package from SoftwareAG.

I think that the point is encoding! How can I set encoding using JAVA to “UTF-8” ??

One possibilitity was:

setEncoding(“UNICODE-1-1-UTF-8”);

but it is not “UTF-8” and I couldn’t find any records with polish fonts!

In RFC1945 is about encoding. There is available encoding:
charset = “US-ASCII”
| “ISO-8859-1” | “ISO-8859-2” | “ISO-8859-3”
| “ISO-8859-4” | “ISO-8859-5” | “ISO-8859-6”
| “ISO-8859-7” | “ISO-8859-8” | “ISO-8859-9”
| “ISO-2022-JP” | “ISO-2022-JP-2” | “ISO-2022-KR”
| “UNICODE-1-1” | “UNICODE-1-1-UTF-7” | "UNICODE-1-1-UTF-8"

There is not “UTF-8” !!

Best regards
Dariusz Baumann
POLAND

PS.
1) In version 2.3 of Tamino encoding is “UTF-8” ?
2) By the way polish fonts is supported by “ISO-8859-2” (in java and TII it didn’t work!)
3) Any helps will be very appeciated! Please help me.

I think the iso-8859-2 is possibly confusing things so its probably best to stick with unicode when coding with java.

I created a document in UNICODE and stored this into Tamino.

I then wrote a program (below) to read that document back and this works.

import com.softwareag.tamino.api.dom.*;

public class polish
{
static void main (String args[])
{
try
{
TaminoClient tc = new TaminoClient (“http://localhost/tamino/xml23”) ;

TaminoResult r = tc.query (“Pracownik[Nazwisko~=“Paw?owski”]”, “pcollection”) ;
}
catch (Throwable e) { e.printStackTrace(); }
}
}

In my example I saved the document as Unicode as well and compiled with javac -encoding unicode.

The setEncoding() method is not present in the code because it is deprecated in v2.3; the TaminoClient object sets the encoding to utf-8 anyway.

You must be right that the best way is encoding set to UNICODE instead of ISO-8859-2 or Windows-1250. In my opinion it is very strange that my native XML TAMINO database must have format UNICODE. :frowning:

Today I installed new version 2.3.1.1 of TAMINO. In TII I could find records with polish fonts only when encoding I have set to Windows-1250 (when I set utf-8, ISO-8859-2 I didn’t find anything!)

In my JBuilder java project I have still not found any records with polish fonts (I installed new jar package from 2.3.1.1 TAMINO). :frowning:

Any others clues I will be very grateful and appreciated! :-))

Best regards
Dariusz Baumann
POLAND

I achived full success!!! :slight_smile:

I can tell you what I did:
1) Encoded xml file (Windows1250) into new file (ISO-8859-2) by freeware polish converter named cpl.
2) Changed encoding file in new file by:
<?xml version="1.0" encoding="ISO-8859-2"?>
Pleviously I forgot about it at all! :frowning:
3) Created new database, loaded schema & new encoded xml file.
4) Searched records with polish fonts in my sample project using JBuilder

Everything is marvellous work! :-)))

Best regards from Poland
Dariusz Baumann

PS. Tamino version 2.3.1.1