Unicode homepage http://www.unicode.org/
BTW on SYSEXPG(UNICOX01) you find a Natural program writing out all Unicode characters. But it will only write characters supported by your system.
UH is a constant. You can define U variables.
As written in the example mentioned, there is a ‘Unicode Normalization Form C’ (NFC). This is the shortest way to represent a character. Natural expects that Unicode variables are in NFC format (see next page of same example). So if you read data from external source (like a file) which is not in NFC format, you should normalize it using the MOVE NORMALIZED.
Look to the Windows Character Map (in my Windows XP it is under Accessories -> System Tools) and select a Unicode character set like Arial Unicode. Mark advanced view. Character Set: Unicode. Grouped by: Unicode Subrange. There is a subrange named “Combining diacritical Marks”. There you have all marks which can be added to the character in front like U’0308’ (Combining diaeresis).
U variables in Natural are UTF-16 which is endian dependent (see SYSEXV(V62UINTR)). UH constants are UTF-16BE which is endian independent. Thus after
MOVE UH’00610308’ TO U1
U1 will contain ‘61000803’ on Windows and ‘00610308’ on mainframe and Unix. But since this is the internal representation, you should not care
On the other hand, you can move your program to any platform and the statement will always work fine.
If you would like to see the original UH value on any platform you can use a statement like
MOVE ENCODED #UNICODE TO #ALPHA IN CODEPAGE 'UTF-16BE'
The trick is to use the codepage UTF-16BE which is the big endian indepent representation. #ALPHA (A) should have twice the size of #UNICODE (U). This works also the other way around (maybe as answer to your question 2).