Unicode in Adabas

We have existing databases supporing the western european character set code page on our IBM z/OS mainframe. Our plan is to take this forward and support double-byte and multi-byte character data (same database).

Considering the direction Natural is going in supporting Unicode in version 4.2, what should I do with my Adabas data to support storage of Unicode data (i.e., to allow for Chinese characters for example)? Do I only need to convert Alpha fields to wide alpha format for those which may contain this kind of data? Or do I need to do more?

I have already decided that inverted list sequence can remain in normal EBCDIC order.

Thanks,

Brian Johnson

Hello Brian,

Natural 4.2 will read the Wide field data (field format W) in Unicode (UTF-16).

Since Unicode is the default encoding for W fields in Adabas, the easiest approach would be to leave the internal file encoding for W fields at the default. This avoids conversion overhead.

Is there any requirement to process or store specific DBCS encodings for Japanese, Chinese or Korean text data?

Converting Alpha field data to Wide would need to be done.

The sort sequence of Unicode is an ‘extension’ of the ASCII sort order. W field descriptors will have their Unicode values in the index. This is not the usual EBCDIC order!

EBCDIC order could be achieved by writing a Collation Exit that orders the unicode characters in the appropriate way. However, this requires extra effort.

Kind regards,
Marbod

Thank you, Marbod. As a follow-up question, then, besides converting whichever fields I need to establish as WIDE alpha character fields, do I need to do anything else?

Specifically, do I need to enable Adabas for UES support, and if so, how is this done?

I understand what you are saying about inverted lists on WIDE alpha fields being in Unicode (UTF-16) order, which is ASCII-based and not EBCDIC, so that’s okay as long as we understand that.

Guess I have another question on best practices… because the green screens are not able to accomodate multi-byte characters (i.e., Chinese), is it best to leave existing fields as regular alphas and just add new fields to store data that is only accessible by applications with alternate front-ends (i.e., web-based screens)?

Thanks,

Brian

Hello Brian,

before you can define W fields or use collation descriptors you have to UES enable the Adabas database. You can do this with ADADEF either at database creation (ADADEF DEFINE) or with the ADADEF MODIFY function.

In the nucleus JCL you’ll need to add the UES related DD statements that Entire Conversion Services (DDECSOJ) and the underlying SMARTS runtime need. Please see the Adabas Operations Manual for further details.

With respect to multi-byte encodings, the Asian green screens can display data with a much larger set of characters from alpha fields in stored in mixed DBCS.

Using W fields separate from A fields seems the right way to go.

There are migration paths to get exising the data in a form suitable to the internationalized application:

  • Using ACODE= in the Adabas open command with e.g. ACODE=4091 for UTF-8

  • Using a W format override of an A field, e.g. AA,W,… requests the alpha field AA in W format

Of cause this does not work well for updates, if characters are entered that are not represented in the character set of the single byte alpha code page.

Kind regards,
Marbod

Thanks, Marbod, for your help. I will follow your suggestions to expand our system’s multi-lingual capabilities.