next up previous contents index
Next: Date Up: Attribute descriptions Previous: Checksum



In the computing world of the 1990s, two major character sets are in wide use: EBCDIC¸EBCDIC character set¸character setEBCDIC on IBM mainframes and their clones, and ISO/ASCII¸ASCII character set¸ISO character set¸character setISO¸character setASCII on everything else. EBCDIC is an 8-bit character set, offering characters in the range 0 ... 255, while ISO/ASCII is a 7-bit character set, with characters in the range 0 ... 127. On most machines, ISO/ASCII text is stored in 8-bit characters.

In turns of numbers of computers, ISO/ASCII is by far the most common, since it is the character set used by all personal computers and workstations.

Unfortunately, a 128-character set with 95 printable characters and 33 control characters is inadequate for most non-English languages. Many European languages require accented characters or additional letters, and Chinese,¸Chinese characters Japanese,¸Japanese characters and Korean¸Korean characters have thousands of pictographic characters.¸character setpictographic

Consequently, computer vendors have dealt with this by offering ISO `code pages'¸character setcode pages -- variations in the encoding of characters 128 ... 255, and sometimes even in the encoding of punctuation characters in the range 0 ... 127.

Standards bodies are actively working on the development of a new character set that will support all, or almost all, of the world's present and past languages. One of these efforts is a 16-bit character set called Unicode,¸character setUnicode and another is a 32-bit character set called ISO 10646.¸character setISO 10646 Efforts are now underway to merge these efforts into a character set called ISO 10646M (M for merged).¸character setISO 10646M

Given the speed at which committees work, and the enormous impact on millions of computers, and people, of a change in text encoding, it seems unlikely that the impact of these efforts will be felt for another decade.

The code page problem, however, does have to be dealt with. The standard file headers provide for this with an attribute entry like

%%%     codetable       = "ISO/ASCII",

If the file is encoded in, say code page ISO-8859-3, then the header could say that:

%%%     codetable       = "ISO-8859-3",

Of course, if an ASCII file were transferred to a system with EBCDIC, the file would not be immediately readable until the character values were translated to EBCDIC. The checksum described in the preceding section would be incorrect, but at least the fact that the file header stated that the code was originally ISO/ASCII would explain any translation peculiarities that cropped up later.

The attribute name codetable¸codetable was chosen over codepage because the latter notion is restricted to variants of ISO/ASCII.

date, docstring, codetable, Attribute descriptions

next up previous contents index
Next: Date Up: Attribute descriptions Previous: Checksum
Nelson H. F. Beebe