I think, from ADABAS point of view, it’s irrelevant if the characters are stored in ISO-8859 or UTF-8 format. But what about Natural? How would Natural display an UTF-8 encoded umlaut or other special characters?
As a first test, I write the following simple program:
write
'UTF-8 encoded umlaut u is:' H'C384' /
'UTF-8 encoded ligature sz is:' H'C39F' /
'UTF-8 encoded euro symbol is:' H'E282AC'
end
This works quite good if I set my terminal-emulation to UTF-8 first! But at the end of the line, some characters from the previous screen are still displayed.
The next problem comes with the maps. For example: If I write a map with the EURO-Symbol in UTF-8 on it, it would be written into the map-source with a length of 3 byte, but it would be displayed with a character-width of one. This causes problems in the map-editor. For example it’s impossible to fill a map with a width of 79 with 79 characters. The next thing is: It’s almost impossible to edit such a text-constant.
Here’s my test-map:
[code]
MAP2: PROTOTYPE — CREATED BY UNIX 6.1.1 —
INPUT USING MAP ‘XXXXXXXX’
FORMAT PS=003 LS=080 ZP=OFF SG=OFF KD=OFF IP=OFF
Hello Steven Wild!
Thanks for the link to the Unicode-documents. Here are some excerpts:
At the moment I’m working with Natural 6.1.1 for Solaris. It seems, that I have to wait a little bit …
If I understand this correctly, Natural uses UTF-16 and all related statements (like EXAMINE, MOVE SUBSTR) are adapted to handle 2 bytes per character. But UTF-16 does not mean, that every character can be represented by 2 bytes. Characters above U+FFFF are represented by 4 bytes (surrogate pair).
See: UTF-16 - Wikipedia
So my question is: Will Natural 6.2 (Open Systems) support UTF-16 with or without surrogates?
That’s the reason why UTF-8 became the quasi-standard for Unicode representation.
Next question: Does Natural 6.2 only support UTF-16?
This could be a problem, because most of XML-Documents I’ve seen until now are using UTF-8. Even my PuTTY-terminal-Emulation can only handle UTF-8. OK, it would be possible to write a converter for the XML-Files (but I will not do this in Natural)…
Next question: Does Natural 6.2 only support UTF-16?
<
Natural introduces with Nat 4.2 and Nat 6.2 a new format ‘U’. Natural Unix, Win handles internally this format in UTF-16. But just for ‘U’ variables. All other variables are treated as before.
From my point of view it would be the best to read first the documentation, because there are many questions answered.
First of all: I don’t have any documentation of Nat 6.2. I only got the links to the documents that Stephen Wild mentioned above. There I read about the new U-Format in Natural (which is UTF-16) and the corresponding new W-Format in ADABAS (which is UTF-8 ). The conversion between Natural and Adabas is done automatically.
From my point of view UTF-8 is the quasi-standard. And my question still remains: Does Natural 6.2 only support UTF-16?
In other words: What about Workfiles and XML-docs with UTF-8 encoding?
Hello Matthias,
Natural internally stores Unicode characters in UTF16 format. This allows quick scanning in a Unicode string as each character starts at a fixed location.
The MOVE statement was enhanced to allow conversion from one code page to another. Please refer to
MOVE ENCODED
You may specify UTF8 als source and UTF16 as target and vice versa, allowing the conversion.
Surrogate characters in UTF16 require 4 bytes of storage and must be handled in pairs of U-characters. The EXAMINE statement has been enhanced to detect surrogates.
EXAMINE [FULL [VALUE [OF]]] {op1 | SUBSTR(op1,op2,op3)}
[POSITION-clause] [FOR]
[CHARPOSITION op4] [CHARLENGTH op5]
[[GIVING] POSITION IN op6] [[GIVING] LENGTH IN op7]