Previous: background Up: ../karrtn.html Next: character-primitives
CRITERIA FOR SATISFACTORY SUPPORT OF CHARACTER DATA
===================================================
The reaction of some people on reading the above
criticisms, or having experienced them personally, will no
doubt be to reject FORTRAN completely as a language in which
any kind of character manipulations are to be done. There
is certainly some validity to this view. However, as one
Conference participant remarked, there is really no choice
in the matter, for FORTRAN 66 is the only "(almost)
machine-independent high-level 'assembly' language" that we
have for scientific computation.
FORTRAN is available on essentially all medium- and
large-scale computers in the world today, and also on many
microcomputers as well. It has been in existence for nearly
twenty-five years, and is one of the two or three
still-existing original high-level programming languages. It
is widely understood by scientists and engineers the world
over.
A widely-implemented ANSI and ISO Standard has been
in existence for fourteen years, and in fact, FORTRAN was
probably the first language to be so standardized.
An enormous amount of FORTRAN software,
representing a huge investment of money and programmer
years, already exists, and sophisticated and extensive
scientific subroutine libraries such as IMSL, Harwell,
Boeing, NAG, EISPACK, FUNPACK, and LINPACK are widely
available.
FORTRAN's lack of structured control statements,
but unfortunately not its limited variety of data types, can
be largely avoided by programming in a preprocessor
language, such as RATFOR or SFTRAN3, which can then be
translated into Portable FORTRAN.
Finally, and importantly, there exist automated
tools such as the PFORT Verifier, which can be used to test
FORTRAN software for adherence to Portable FORTRAN syntax,
grammar, and usage.
In constructing a set of character primitives for
widespread implementation on a variety of host machines, two
goals must be kept in mind. First of all, the primitives
should provide frequently-needed functions. Examples of
these include packing and unpacking of characters, obtaining
integer equivalents, comparing and moving strings, and
letter case and character set conversions. Second, they
should permit machine-independent implementation of programs
which manipulate character data.
The second goal carries with it an important
decision. This is that a standard character set must be
adopted, or at least be available via function calls, in
order that such operations as sorting by collating sequence,
or the use of integer equivalents of characters for
governing the flow of control in programs such as parsers
and lexical analyzers, can be implemented in a fashion which
will guarantee that the same results will be obtained,
independent of the host computer.
There is fortunately at present an
internationally-agreed-upon character set, known as ASCII
(American National Code for Information Interchange),
defined in ANSI Standard X3.4-1968 and revised in X3.4-1977.
It has been adopted in Japan as the Japanese Industrial
Standard Code for Information Interchange (JISCII) (1969),
and by the International Standards Organization as ISO DR
1052 (1967). Unfortunately, at present the American "Big 3"
computer manufacturers IBM, CDC, and UNIVAC do not provide
wide support for ASCII, although both UNIVAC and CDC are
evidently moving in that direction.
ASCII is a 7-bit code offering 2**7 or 128
different characters, made up of 32 standard control
characters, followed by a space, then the special characters
!"#$%&'()*+,-./, the digits 0-9, the special characters
:;<=>?@, upper-case letters A-Z, special characters [\]^_`,
lower-case letters a-z, special characters {|}~, and
finally, a DELete control character. With the exception of
the DELete control character, the special characters
following the letters may be replaced with national
characters for those alphabets having more than 26 letters.
Standardization work is going on at present to expand the
code to 8 bits, and Cyrillic and Japanese Katakana
characters have already been assigned to characters in the
range 128-255 for use in the Soviet Union and Japan.
This proposal recommends the adoption of the ASCII
character set as a standard one, and functions are defined
allowing access to it even on those computers which do not
yet use it. It is worth noting in passing that the new U.S.
Department of Defense programming language, ADA [SIGP79],
has specified that all character data shall be in the ASCII
character set, independent of the host computer.