Previous: background Up: ../karrtn.html Next: character-primitives


SUPPORT-CRITERIA

     CRITERIA FOR SATISFACTORY SUPPORT OF CHARACTER DATA
     ===================================================
 
          The  reaction of  some people on  reading the above
 criticisms,  or having experienced them  personally, will no
 doubt be to reject FORTRAN completely as a language in which
 any kind  of character manipulations are  to be done.  There
 is  certainly some validity  to this view.   However, as one
 Conference  participant remarked, there is  really no choice
 in  the  matter,  for  FORTRAN  66  is  the  only  "(almost)
 machine-independent high-level  'assembly' language" that we
 have for scientific computation.
 
          FORTRAN is available on essentially all medium- and
 large-scale computers  in the world today,  and also on many
 microcomputers as well.  It has been in existence for nearly
 twenty-five  years,   and  is  one  of   the  two  or  three
 still-existing original high-level programming languages. It
 is  widely understood by scientists  and engineers the world
 over.
 
          A widely-implemented ANSI and ISO Standard has been
 in  existence for fourteen  years, and in  fact, FORTRAN was
 probably the first language to be so standardized.
 
          An    enormous   amount    of   FORTRAN   software,
 representing  a  huge investment  of  money  and  programmer
 years,  already  exists,  and  sophisticated  and  extensive
 scientific  subroutine  libraries  such  as  IMSL,  Harwell,
 Boeing,  NAG,  EISPACK,  FUNPACK,  and  LINPACK  are  widely
 available.
 
          FORTRAN's lack  of structured  control  statements,
 but unfortunately not its limited variety of data types, can
 be   largely  avoided  by  programming   in  a  preprocessor
 language, such  as  RATFOR or  SFTRAN3,  which can  then  be
 translated into Portable FORTRAN.
 
          Finally,  and importantly,  there  exist  automated
 tools such as the PFORT  Verifier, which can be used to test
 FORTRAN  software for adherence to  Portable FORTRAN syntax,
 grammar, and usage.
 
          In  constructing a set of  character primitives for
 widespread implementation on a variety of host machines, two
 goals  must be kept  in mind.  First of  all, the primitives
 should  provide frequently-needed  functions.   Examples  of
 these include packing and unpacking of characters, obtaining
 integer  equivalents,  comparing  and  moving  strings,  and
 letter case  and character  set conversions.   Second,  they
 should permit machine-independent implementation of programs
 which manipulate character data.
 
          The  second  goal  carries  with  it  an  important
 decision. This  is that  a standard  character set  must  be
 adopted,  or at  least be  available via  function calls, in
 order that such operations as sorting by collating sequence,
 or  the  use   of  integer  equivalents  of  characters  for
 governing  the flow of  control in programs  such as parsers
 and lexical analyzers, can be implemented in a fashion which
 will  guarantee that  the  same  results will  be  obtained,
 independent of the host computer.
 
          There     is    fortunately     at    present    an
 internationally-agreed-upon character  set, known  as  ASCII
 (American   National  Code   for  Information  Interchange),
 defined in ANSI Standard X3.4-1968 and revised in X3.4-1977.
 It has  been adopted  in Japan  as the  Japanese  Industrial
 Standard  Code for Information  Interchange (JISCII) (1969),
 and  by the  International Standards Organization  as ISO DR
 1052 (1967). Unfortunately,  at present the American "Big 3"
 computer  manufacturers IBM, CDC, and  UNIVAC do not provide
 wide  support for  ASCII, although  both UNIVAC  and CDC are
 evidently moving in that direction.
 
          ASCII  is  a   7-bit  code  offering  2**7  or  128
 different  characters,  made   up  of  32  standard  control
 characters, followed by a space, then the special characters
 !"#$%&'()*+,-./,  the  digits 0-9,  the  special  characters
 :;<=>?@, upper-case  letters A-Z, special characters [\]^_`,
 lower-case   letters  a-z,  special   characters  {|}~,  and
 finally, a DELete  control character.  With the exception of
 the   DELete  control  character,   the  special  characters
 following   the  letters  may  be   replaced  with  national
 characters for those  alphabets having more than 26 letters.
 Standardization  work is going  on at present  to expand the
 code   to  8  bits,  and   Cyrillic  and  Japanese  Katakana
 characters  have already been assigned  to characters in the
 range 128-255 for use in the Soviet Union and Japan.
 
          This proposal recommends  the adoption of the ASCII
 character set  as a standard one,  and functions are defined
 allowing access  to it even on those  computers which do not
 yet use it.  It is worth noting in passing that the new U.S.
 Department of  Defense programming  language, ADA  [SIGP79],
 has specified that all  character data shall be in the ASCII
 character set, independent of the host computer.