Unicode - MOVE NORMALIZED - a good example

Billy · October 6, 2006, 5:54pm

Hi,

I read the manual and I confess you that I didn’t understand this statement. I understood how to work “move encoded”, but it is being difficult to understand “move normalized”.

Look at the example below (this example is included on SYSEXV library:

The character (

Lukas_Hundemer · October 6, 2006, 7:23pm

Unicode homepage http://www.unicode.org/
BTW on SYSEXPG(UNICOX01) you find a Natural program writing out all Unicode characters. But it will only write characters supported by your system.
UH is a constant. You can define U variables.
As written in the example mentioned, there is a ‘Unicode Normalization Form C’ (NFC). This is the shortest way to represent a character. Natural expects that Unicode variables are in NFC format (see next page of same example). So if you read data from external source (like a file) which is not in NFC format, you should normalize it using the MOVE NORMALIZED.
Look to the Windows Character Map (in my Windows XP it is under Accessories → System Tools) and select a Unicode character set like Arial Unicode. Mark advanced view. Character Set: Unicode. Grouped by: Unicode Subrange. There is a subrange named “Combining diacritical Marks”. There you have all marks which can be added to the character in front like U’0308’ (Combining diaeresis).

U variables in Natural are UTF-16 which is endian dependent (see SYSEXV(V62UINTR)). UH constants are UTF-16BE which is endian independent. Thus after
MOVE UH’00610308’ TO U1
U1 will contain ‘61000803’ on Windows and ‘00610308’ on mainframe and Unix. But since this is the internal representation, you should not care
On the other hand, you can move your program to any platform and the statement will always work fine.
If you would like to see the original UH value on any platform you can use a statement like

 MOVE ENCODED #UNICODE TO #ALPHA IN CODEPAGE 'UTF-16BE'

The trick is to use the codepage UTF-16BE which is the big endian indepent representation. #ALPHA (A) should have twice the size of #UNICODE (U). This works also the other way around (maybe as answer to your question 2).

system · October 6, 2006, 8:48pm

Thats not Correct. It doesn’t depent what Operatingsystem you use. It depends on which Hardware the Operatingsystem runs and if the Operatingsystem will abstract from Endians or not. Have a look at the first Post to the headline Endian

Greetings
Sascha Wiegandt

Lukas_Hundemer · October 6, 2006, 10:28pm

uuups. I generalized too fast from my environments :oops:
thanks

Topic		Replies	Views
Natural and UTF-8 Adabas-Natural , Natural , Natural-on-Linux	9	10095	April 2, 2021
Does natural support to store special characters Adabas-Natural , Natural , Natural-Code-Samples	12	2290	April 2, 2021
Redefine Alphanumeric Filed Adabas-Natural , Natural , Natural-News-and-General-Topics	3	7676	April 2, 2021
How to convert Binary to Numeric / Alphanumeric Adabas-Natural , Natural , Natural-on-Mainframes	6	8586	April 2, 2021
MOVE BY NAME - spaces converted to zeroes ?? Adabas-Natural , Natural , Natural-on-Mainframes , Adabas-Natural-Cloud	35	21633	April 2, 2021

Unicode - MOVE NORMALIZED - a good example

Related topics