Unicode - MOVE NORMALIZED - a good example

Hi,

I read the manual and I confess you that I didn’t understand this statement. I understood how to work “move encoded”, but it is being difficult to understand “move normalized”.

Look at the example below (this example is included on SYSEXV library:

The character (

  1. Unicode homepage http://www.unicode.org/
    BTW on SYSEXPG(UNICOX01) you find a Natural program writing out all Unicode characters. But it will only write characters supported by your system.

  2. UH is a constant. You can define U variables.

  3. As written in the example mentioned, there is a ‘Unicode Normalization Form C’ (NFC). This is the shortest way to represent a character. Natural expects that Unicode variables are in NFC format (see next page of same example). So if you read data from external source (like a file) which is not in NFC format, you should normalize it using the MOVE NORMALIZED.

  4. Look to the Windows Character Map (in my Windows XP it is under Accessories -> System Tools) and select a Unicode character set like Arial Unicode. Mark advanced view. Character Set: Unicode. Grouped by: Unicode Subrange. There is a subrange named “Combining diacritical Marks”. There you have all marks which can be added to the character in front like U’0308’ (Combining diaeresis).

U variables in Natural are UTF-16 which is endian dependent (see SYSEXV(V62UINTR)). UH constants are UTF-16BE which is endian independent. Thus after
MOVE UH’00610308’ TO U1
U1 will contain ‘61000803’ on Windows and ‘00610308’ on mainframe and Unix. But since this is the internal representation, you should not care :wink:
On the other hand, you can move your program to any platform and the statement will always work fine.
If you would like to see the original UH value on any platform you can use a statement like

 MOVE ENCODED #UNICODE TO #ALPHA IN CODEPAGE 'UTF-16BE'

The trick is to use the codepage UTF-16BE which is the big endian indepent representation. #ALPHA (A) should have twice the size of #UNICODE (U). This works also the other way around (maybe as answer to your question 2).

Thats not Correct. It doesn’t depent what Operatingsystem you use. It depends on which Hardware the Operatingsystem runs and if the Operatingsystem will abstract from Endians or not. Have a look at the first Post to the headline Endian

Greetings
Sascha Wiegandt

uuups. I generalized too fast from my environments :oops:
thanks