Previous: introduction Up: ../karrtn.html Next: support-criteria


BACKGROUND

                          BACKGROUND
                          ==========
 
          Before  describing the  proposed  primitives,  some
 background information is  useful. FORTRAN has never offered
 satisfactory  support   of  character  data.   Indeed,  some
 compilers extant  until  the mid-1960's  did not  even  have
 Hollerith data  items or  A FORMAT  descriptors, or  LOGICAL
 variables,  for that matter. When  limited character support
 became widely  available in  FORTRAN, it  was restricted  to
 Hollerith string constants of the form 8HCHEMISTRY, together
 with the A  FORMAT item.  Hollerith constants were permitted
 by the 1966 ANSI FORTRAN  Standard to occur only in DATA and
 FORMAT  statements,  and as  subroutine  arguments  in  CALL
 statements (but  not  in  FUNCTION references,  although  no
 compiler that I am  aware of enforces this restriction).  No
 CHARACTER data  type  was  introduced, and  characters  were
 forced to masquerade in the guise of other data types.
 
          Coding Hollerith  strings is  somewhat tedious  and
 error-prone,   because   of  the   necessity   of   counting
 characters.  Consequently,   many  manufacturers   permitted
 character   constants   to  be   surrounded   by   delimiter
 characters, for example,  "CHEMISTRY", but again, no general
 agreement  was reached  about what  the delimiter characters
 ought to  be. Single and double quotes  are most common, but
 asterisks and  not-equal signs  have also  been used.   When
 string  delimiters are used,  the question arises  as to how
 the  delimiter character  itself is  to be  represented in a
 string constant.  Usually, the  doubled-delimiter  approach,
 "O""MALLEY"  for the  string O"MALLEY, has  been adhered to,
 although  CDC's use  of the  asterisk as  a string delimiter
 simply prohibited its  appearance as a string character.  As
 a result of these  variations, only the Hollerith string can
 be  relied upon  for  portability,  and automated  means  of
 converting  between  the  different  string  conventions  in
 FORTRAN source programs are available at some installations.
 
          The 1966  implementation of  support for  character
 data is  just about the worst  possible.  The Hollerith form
 is certainly undesirable.   Even worse is the convention for
 internal storage of character strings.  These must always be
 stored  left-justified in a computer  word, and right-padded
 with blanks  if the number of  characters specified does not
 fill  an integral  number of  machine words.   The number of
 characters which  fit  in a  word ranges  from  1 to  10  on
 existing  computers  [BEEB79],  and  the  left-justification
 means that even if  one arranges to store only one character
 per word for word-length independence, the character will be
 occupying the  most-significant bit  positions and  probably
 the  sign bit as  well.  This means that  even comparison of
 characters for equality can result in an arithmetic overflow
 condition   on   those  machines   where   comparisons   are
 implemented  by subtraction.   It also  means that accessing
 the numerical value of  a character cannot be done portably,
 for  division by a power  of two to effect  a right shift of
 the bit  pattern will fail if the  sign position is occupied
 by a 1-bit.
 
          Another problem is  that depending upon the FORTRAN
 type  of  the  variable  in  which  characters  are  stored,
 different results may be obtained on different machines. For
 example,  character   storage   in  LOGICAL   variables   is
 impossible on those machines which implement LOGICAL scalars
 and arrays  as bit  strings, and  on most  others, the  1966
 Standard's   prohibition  of  the  use   of  the  relational
 operators .EQ., .NE.,  .LT., etc.  between LOGICAL variables
 would  prevent character  comparisons.  Floating-point types
 are also  unsuitable, because  mantissa normalization  which
 may occur in assignments or in expression evaluation usually
 will scramble the  bits, destroying the characters stored in
 the  word. This leaves  INTEGER variables and  arrays as the
 only  possible repository  of character data,  and even this
 may  fail.  On the  IBM 7030 Stretch  computer, for example,
 integers   are  represented   internally  as  floating-point
 numbers, and unless assembly-language coding is resorted to,
 it is very inconvenient just to get character data correctly
 in and out of variables on that machine.
 
          The  1977 FORTRAN  Standard has made  an attempt to
 remedy these difficulties by the introduction of a CHARACTER
 data type,  but  is  still not  going  to offer  a  complete
 solution.
 
          First  of all,  the Hollerith data  type is dropped
 from the 1977 Standard. This means that a very large body of
 existing FORTRAN software which uses character data, even in
 an at-present widely portable fashion, may require extensive
 changes   to  run   with  a  FORTRAN   77  compiler,  unless
 manufacturers   can  be  pressed  to   continue  support  of
 character data stored  in Hollerith constants and variables.
 The  1977  standard  prohibits  all  storage  equivalencing,
 either via COMMON and EQUIVALENCE statements, or by FUNCTION
 or SUBROUTINE  argument associations, between CHARACTER data
 and  all other  FORTRAN data  types.  This  was necessary to
 enable  FORTRAN  77  to  support  variable-length  character
 strings, so that declarations of the form
 
       SUBROUTINE A (B,C)
       CHARACTER B*(*),C(*)*(*)
 
 could be permitted,  allowing CHARACTER variables to inherit
 both a size and an array length from a calling program. This
 forces a  compiler to  generate  code to  pass to  a  called
 routine  the address of a  string descriptor containing size
 and dimension information, as well the actual address of the
 character data.
 
          Second,  standardized library  support of character
 data in the form  of useful utility routines is non-existent
 in  the  1977  Standard,  apart  from  the  ICHAR  and  CHAR
 functions for converting between INTEGER and CHARACTER form.
 
          Third, null character  strings, that is, strings of
 zero  length, are not  permitted.  Null strings  are in fact
 quite  useful,   and   indeed,   even  necessary   in   some
 applications.   In  particular,  a  null  string  cannot  be
 simulated by any string of non-zero length.
 
          Fourth, the  1977  Standard  does not  specify  the
 character set to be  used.  The fact that many manufacturers
 employ  their private versions of  character sets, each with
 its own special character repertoire and collating sequence,
 only  continues to  perpetrate additional machine dependence
 upon FORTRAN users.