Supplement to Appendix H: Historical Floating-Point Architectures

Original version: Sat Dec 9 16:09:20 2023. Updated Tue Dec 12 09:59:53 2023, Thu Jan 4 10:46:10 2024.

Since I wrote the history appendix in The Mathematical-Function Computation Handbook, I have found additional information about unusual floating-point formats in early computers that are not mentioned in my book.

The extreme variation in the implementation of floating-point data formats, instructions, and rounding behavior in pre-1980 computer designs was a significant barrier to software portability, which in those days, primarily meant COBOL and Fortran. The almost universal adoption of IEEE 754 arithmetic, or subsets thereof, since the Intel 8087 coprocessor was introduced in 1980, has radically simplified the task of the numerical programmer. However, the existence of subsets, differences in underflow and overflow detection, and lack of support for all of the features of IEEE 754, including 128-bit formats, exception flags, exception handling, fused multipy-add, fast run-time rounding mode control, subnormals, and decimal arithmetic shows that the computing industry has still not appreciated the value of a new arithmetic design that is now more than 45 years old!

The Burroughs B5000 design began in early 1961, and the first machine was delivered in 1963. Its architecture differs greatly from other machines of its time. The B5000 is stack oriented, so instructions are short because they do not encode register numbers. Systems programming is done in any of three extended dialects of Algol 60, rather than in assembly language.

The B5000 word size is 48 bits, and memory is normally addressed by word. However, it can also be addressed by character.

Instructions are coded in 12-bit syllables, four per word. Each syllable consists of a 2-bit type code, and a 10-bit instruction code. Although that could permit 2**10 = 1024 different instructions, only about 60 are used, and some instructions employ the remaining syllable bits for data.

Operations are performed on a memory stack, but the top two locations are instead in fast registers. Pushing an operand onto the stack causes the second entry to spill into a memory array, and be replaced by the original first entry. However, for speed, a status bit prevents the memory store if only the top stack item is in use before the push. Arithmetic +, -, *, and / operations replace the top two stack values with the result, so a chain of such operations is a series of push-and-operate instruction pairs, with one memory reference per pair.

For arithmetic, the 48-bit word contains a 1-bit flag bit, a 1-bit exponent sign, a 6-bit power-of-8 exponent field, a 1-bit significand sign, and a 39-bit (13-octal-digit) unnormalized integer significand. There are thus multiple representations of numbers that have leading or trailing triples of 0-bits: 0o1, 0o10o-1, 0o100o-2, …, 0o1_000_000_000_000o-12 are equivalent. Comparison instructions therefore have to normalize their operands before comparing their bit patterns.

The flag bit indicates whether the word contains executable code, or data: such a distinction appeared later in some paged memory designs, but only at the level of pages (typically 512 or more words). The flag bit can only be set by the operating system, so it is impossible for user code to create new executable code. There is no flag bit in character data.

Integers are represented by zeroes in the exponent sign and exponent fields, and thus have a sign-magnitude representation with a range of [-(2**39 - 1), (2**39 - 1)], or [-549_755_813_887, +549_755_813_887]. There is no distinction between single-word integer and floating-point values, and thus no need for different instructions for them. However, separate instructions are provided for double-precision floating-point arithmetic.

Integer overflow is undetectable: it means that the number remains large, but is now an inexact floating-point value. The same is true for the CDC 6000 and 7000 machines. The designer of Pascal and later languages, Niklaus Wirth (1934–2024), found that misfeature a considerable aggravation for implementing reliable compilers: see his 1972 report On "Pascal", Code Generation, and the CDC 6000 Computer .

The single-precision floating-point nonzero magnitude range is [1 * 8**(-63), 549_755_813_887 * 8**(63)], or about [1.27e-57, 4.31e+68]. The asymmetry in the number range is justified by arguing that it helps to avoid the overflow region in practical computations. The 39-bit significand represents about 11 decimal digits.

The base-8 exponent means the floating-point numbers suffer from wobbling precision, with loss of up to 2 leading bits. Multiplication and division by 2 may not be exact, but by 8, are exact.

A floating-point overflow sets an exception bit, and wraps to the underflow region. An underflow sets a different exception bit, and wraps to the overflow region. In both cases, the significand is correct, but the exponent is off by 64.

For double-precision floating-point arithmetic, the second word has 9 leading 0-bits that are otherwise ignored, followed by a 39-bit significand that is treated as the fractional part of the integer in the first word. Precision is increased to 78 bits (about 23 decimal digits), and the range is extended at the low end: [2.31e-69, 4.31e+68].

The Burroughs hardware manuals report that excess intermediate bits in floating-point computations cause rounding, but give no details on how that adjustment is done. In one manual, the description of the double-precision multiply instruction says that it produces a product of 52 octal digits, which is then truncated to 26 octal digits. In another manual, it says rounded.

The B6000 series, introduced in 1969, and sold through the 1980s, largely follows the design of the B5000 machines, but significantly changes the instruction and memory structure. The instruction set encoding is made even more compact by reducing the syllable size from 12 bits to 8 bits, but a few instructions use more than one syllable: as many as 7, crossing word boundaries. Instead of a high-order flag bit in a 48-bit B5000 word, B6000 words are extended to 51 bits, plus a parity bit, with the three high-order bits used as a type code settable only by the operating system and hardware, and all 48 bits are available for data. There are different type codes for single and double precision floating-point values, so separate arithmetic instructions for them are not needed: the codes allow transparent mixing of integer, single-precision, and double-precision operands.

The character size on the B6000 is increased from 6 to 8 bits, so ASCII, EBCDIC, and a Burroughs-specific 6-bit character encoding can all be supported, with the default being EBCDIC, as used in IBM mainframe operating systems, and those of several other market competitors.

The type codes are invisible to user software, so type punning via Fortran EQUIVALENCE or C family union statements or type casts cannot make character and numeric data accessible via a different type. C, of course, did not exist when the B6000 family was first marketed, but Algol, COBOL, and Fortran were the major programming languages on Burroughs machines.

Because of the stack architecture, Burroughs Fortran can easily support recursion, a feature absent from most Fortran implementations that was not added to the language until the 1990s, and then only in a restricted form. Use of recursion in Burroughs Fortran requires a special directive in the source code, because it changes local variable storage from static allocation to stack allocation. Apart from that directive, no other source code changes are needed.

A B6000 single-precision floating-point value has the high-order bit set to 0: it is the flag bit of the B5000. The next two bits are the significand sign and exponent sign, followed by the 6-bit exponent, and the 39-bit integer significand. Although the B6000 bit field positions differ slightly from the B5000, the single-precision formats have identical range and precision in the two machine families: [1.27e-57, 4.31e+68], and about 11 decimal digits. Integers are the same in both families: the exponent-0 case of floating-point numbers.

The B6000 double-precision floating-point encoding also differs from that in the B5000: the power-of-8 exponent is a 15-bit integer value with 6 bits from the first word, and the top 9 bits of the second word. The first word holds 39 integer bits of the significand, and the second word, 39 fractional bits, or roughly 23 decimal digits. The large exponent size increases the nonzero magnitude range to about [2.82e-29_592, 1.95e+29_603], one of the widest in any computer design from major manufacturers.

The Livermore Advanced Research Computer (LARC) was one of the earliest supercomputers. Only two were built, by Univac under government contract. The design began in 1958, and the first LARC was delivered in 1960. Manuals for the machine are available here.

The LARC has a 60-bit word, but uses decimal arithmetic. It represents each decimal digit with 5 bits, and each word stores a sign in the high-order digit, followed by 11 digits. Integers thus range from -99_999_999_999 to +99_999_999_999. The double-word format duplicates the sign digit, so the integer of largest magnitude has 22 digits.

For single-word floating-point arithmetic, the high-order digit is the sign, followed by two exponent digits (biased by 50), an implicit decimal point, followed by nine digits of the normalized fraction. Thus, the smallest nonzero magnitude is 0.1e-50, and the largest is 0.999_999_999e+49.

For double-word floating-point arithmetic, the high-order digit in each word is the sign, but there is no exponent field in the second word, giving 20 decimal digits for the fraction. Thus, such numbers range from 0.1e-50 to 0.999_999_999_999_999_999_99e+49.

On overflow, the exponent wraps around, giving a tiny result. On underflow, a tiny number becomes a large one. In both cases, a trap handler (called a “contingency routine” in LARC documentation), can take corrective action.

The floating-point arithmetic instructions (add, subtract, multiply, divide) normally truncate the result if it would require more digits than the format supports.

However, there are rounded multiply and divide instructions that add one to the last fraction digit if the next digit is 5 or more. There is also a single-precision multiply that produces an exact double-length product. There is no companion for the double-precision format that would need a four-word result.

The MANIAC I (Mathematical Analyzer Numerical Integrator and Automatic Computer Model I) was designed by Nick Metropolis at Los Alamos Scientific Laboratory, and only one was produced. It ran from 1952 to 1958. The MANIAC I was later restored at the University of New Mexico and ran from 1963 to 1965. There is one manual for the machine here.

The MANIAC I was succeeded in 1957 at Los Alamos by the MANIAC II. The MANIAC III appeared in 1964, but was built at the University of Chicago.

A MANIAC I word consists of 40 bigits (the term then used at Los Alamos for binary digits, although John Tukey at Princeton and Bell Laboratories had proposed in 1948 the short form that is now universal: bits).

A fixed-point number on the MANIAC consists of a sign bit in the high-order position, an implicit binary point, and 39 bits of the fraction, with the number in two's complement form. Thus, positive magnitudes range from 0 to +0x0.ffff_ffff_fep+0 (0 to 1 - 2**(-39)). The programmer needs to manage scaling if the number range (-1,+1) is not suited to the computation.

Floating-point arithmetic is provided, but the MANIAC manual recommends on p. 81:

The use of the FPM [floating-point method] is, in general, discouraged for must computation as it greatly slows down the effective computer speed. In most problems, scaling may be accomplished without undue loss of significant figures. In cases where the scaling is difficult to accomplish, a scheme of self-adjusting scaling or the use of scaling checks may be employed as an aid to scaling.

A single-precision floating-point value consists of a sign in the high-order position, an implicit binary point, a 27-bit normalized fraction in [1/2,1), followed by a 12-bit two's complement exponent. That provides an exponent range of [-2048, +2047], a precision of about 8 decimal digits, and a positive number range of about (1.5e-617, 1.61e+616).

As was common at the time, new computer models often differed greatly from their predecessors, and the MANIAC II continued that practice. A description of the system is available here.

The MANIAC II has 12_288 48-bit words. The floating-point format has a 1-bit sign for the exponent, a 3-bit exponent of 65_536 (2**16), a 1-bit sign for the fraction, an implicit binary point, and a 43-bit normalized fraction in [1/2,1), corresponding to about 12 decimal digits. The range of positive nonzero numbers is (0x1p-155, 0x1p+112), or roughly (2e-47, 5e+33).

The 16-bit base may be the largest of any significant machine ever built. Its designers were aware of the problem of wobbling precision in large bases, and wrote:

The Maniac's large base permits a considerable increase in the speed of floating-point arithmetic. Although such a large base implies the possibility of as many as 15 lead[ing] zeros, the large word size of 48 bits guarantees adequate significance.

Thus, effective precision on the MANIAC II could vary from 33 to 48 bits, or from about 10 to 15 decimal digits. That is still more than the 27 bits (about 8 digits) on the MANIAC I.

The MANIAC III design kept the 48-bit word of the model II, but increased the memory size to 16_384 words, and moved from vacuum tube to transistor technology. A brief summary of its architecture is available here. Its unusual arithmetic is described in a 1959 journal article, a 1962 conference proceedings article, and a 1963 journal article.

The second of those articles is eight two-column pages with many intricate details. Here, we only summarize the main features:

The 48-bit word is split into an 8-bit two's complement exponent field, and a 40-bit two's complement fraction in [0,1).
A nonzero exponent field holds the power-of-two exponent biased by 128, so the field holds values in [-127,+127]. The nonzero magnitude range is therefore about [5e-39, 1e+38].
A value with any nonzero exponent, and a zero fraction, is called a relative zero. Such values compare equal, independent of their exponents.
A zero exponent field indicates an absolute zero. Its fraction field is then ignored, and it has the property that any negative value < absolute zero < any positive value or relative zero. An all-bits zero word is therefore an absolute zero.
The fraction magnitude is not normalized to the range [1/2, 1] by arithmetic instructions, unless forced to by a special normalize instruction. Because of the two's complement form, the high-order fraction bit is the sign bit, 0 for positive, and 1 for negative. That leaves 39 bits for the magnitude, or about 11 decimal digits. Because of the lack of normalization, a nonzero fraction could have 38 leading zero bits, followed by a final one bit, and thus, have only a single bit of significance.
Arithmetic operations can produce fractions with leading zero bits; that is how significance loss is detected at the end of a computation.
There is no distinction among integer arithmetic, fixed-point arithmetic, and floating-point arithmetic. All numbers have at most 39 bits of significance.
The floating-point register, U, holds 1 more bit of precision than the memory format.
Hardware instructions provide for add, subtract, multiply, divide, square root, adjust, and scale. The result of the first two is adjusted so that its exponent is that of the larger of its two operand magnitudes. The results of multiply and divide are adjusted to have as many leading zero fraction bits as the operand with the longer such prefix, so that they have the precision of their least-precise operand. Division and square root both produce a remainder in a machine register. The adjust operation manipulates the exponent field, and the scale instruction shifts the fraction right or left.
For multiply and divide, if both operands are nonzero and normalized, the result is also normalized.
The design makes it straightforward to have multiword representations of high-precision numbers, but the exponent range is unchanged. Implementation of arithmetic for such numbers is treated in Section V of Basic Operations in an Unnormalized Arithmetic System.
Rounding is unusual: if the computed bits after the lowest-order storable fraction bit are all zero, then the value to be stored is not altered. Otherwise, its low-order bit is changed to 1, but the remaining higher-order fraction bits are left intact. This is sometimes called von Neumann rounding, although its original disjointed description has led to subsequent confusion in the literature about how it is implemented.

The complexity of arithmetic on the MANIAC III seems not to have been adopted by subsequent computer designs. By the mid-1960s, most computers have a floating-point format for a w-bit word with a 1-bit sign, e-bit biased exponent, and w - e - 1 fraction bits, stored in that order. The number base, B, could be 2, 4, 8, 10, 16, or 256, and the implicit base point precedes the fraction, which, if normalized and nonzero, lies in [1/B, 1). The word width w is usually fixed at 24, 32, 34, 36, 40, 48, 52, 60, or 64. Machines with 16-bit words, such as the DEC PDP-11 and Intel 8086, use at least two words for floating-point numbers. Byte-addressable machines restrict the word size to 16, 32, or 64 bits.

A few CPU designs, and some compilers, now also support 128-bit integers. Hardware support for the 128-bit floating-point formats, alas, remains uncommon: only IBM mainframe systems, IBM PowerPC, late models of the DEC Alpha and VAX CPUs, and a few chip models in the Sun SPARC family have it. Hardware support for the 128-bit format is notably absent from the AMD, ARM, and Intel CPU families that now dominate the markets for embedded devices, tablets, laptops, desktops, mobile telephones, and supercomputers. Compilers that supply a 128-bit floating-point datatype then must do so via calls to software library routines that are much slower than hardware could provide.

Supplement to Appendix H: Historical Floating-Point Architectures

Burroughs B5000 family

Burroughs B6000 family

Livermore LARC

Los Alamos MANIAC I

Los Alamos MANIAC II

University of Chicago MANIAC III

Concluding observations