The cost of IEEE 754 exceptional operands and instructions

Last update: Thu Dec 5 17:59:33 2002

Since IEEE 754 arithmetic is often implemented by a combination of hardware and software, operands that are exceptional values (subnormal, Infinity, or NaN), or that result in exceptional values, can be expensive at run time, compared to normal operands, because they must be handled in software.

To investigate this further, the benchmark program timops.c and the shell script timops.sh to run it for each supported precision, together with the associated files Makefile, ieeeftn.h, second.c, and store.c, were used to measure the performance hit from exceptional values in a wide range of architectures.

The benchmark program contains a loop whose trip count for normal operands is adjusted to be at least one second, and then the same trip count is used to run that loop again with up to six types of operands whose

  1. product is normal;
  2. product underflows to a subnormal;
  3. product underflows to zero;
  4. product overflows to Infinity;
  5. operands are both Infinity;
  6. operands are both (quiet) NaN.

The particular operand values depend on the floating-point precision and on a IEEE 754 floating-point system, but are otherwise independent of the host CPU architecture.

On most current RISC systems, exceptional values are handled by software, but the trap to that software is transparent to the user, apart from taking longer than a hardware implementation would require.

One notable exception to user transparency is the Compaq (formerly DEC) Alpha architecture. Its designers chose to implement a heavily pipelined CPU that (except for the most recent Alpha 21264 and 21364 CPUs) cannot handle exceptional values. The default for C, C++, and Fortran compilers under both Compaq/DEC OSF/1 and GNU/Linux operating systems is to flush underflows abruptly to zero, and to immediately terminate execution on encountering an operand that is subnormal, Infinity, or NaN, or for which the instruction would generate a NaN or Infinity.

In order to produce IEEE 754 nonstop behavior on Compaq/DEC Alpha systems, special compilation options are required:

These options cause the compilers to generate different floating-point instructions that cause traps to software for exceptional operands or results, and in addition, cause the insertion of trap barrier instructions after floating-point operations. The purpose of the latter is to flush the instruction pipeline, allowing precise determination of the interrupt location, so that the software handler can find the instruction and its operands, and complete the job.

Because instruction pipelining is extremely critical for modern high-performance CPUs, it should be expected that the performance hit from IEEE 754 nonstop behavior on Alpha processors may be severe, and that expectation is clearly demonstrated in the tables below.

Notes on the tables

The complete output data from which the tables below are derived are recorded in timops.raw, which should be consulted for details of operating systems, and absolute times. The timops.awk program filters that file to produce the table entries. Numerical entries in the last 5 columns are the slowdown (when > 1) compared to the loop with normal values.

There are several observations to make about the data in the tables below:

Relative timing: sorted by CPU type

-----------------------------------------------------------------------------------------------
CPU                                 MHz Cmpiler fp_size   ufl->   ufl->   ofl->     NaN     Inf
                                                        subnorm    zero     Inf
-----------------------------------------------------------------------------------------------
AMD Athlon                         1400     gcc       4   4.974   3.376   1.000   1.000   0.991
AMD Athlon                         1400     gcc       8   4.802   3.198   1.009   1.000   1.009
AMD Athlon                         1400     gcc      12   1.007   1.000   1.000   1.007   1.013
DEC Alpha 21064 EV4                 100     gcc       4   1.001   1.007   -n/a-   -n/a-   -n/a-
DEC Alpha 21064 EV4                 100     gcc       4   8.252   8.139   8.261   7.532   7.517
DEC Alpha 21064 EV4                 100     gcc       8   1.006   0.999   -n/a-   -n/a-   -n/a-
DEC Alpha 21064 EV4                 100     gcc       8   8.835   8.697   9.572   7.886   7.815
DEC Alpha 21164 EV5                 466     c89       4   1.000   1.000   -n/a-   -n/a-   -n/a-
DEC Alpha 21164 EV5                 466     c89       4  43.636  34.727  21.879  21.121  21.439
DEC Alpha 21164 EV5                 466     c89       8   1.000   1.000   -n/a-   -n/a-   -n/a-
DEC Alpha 21164 EV5                 466     c89       8  66.043  49.217  27.913  27.130  27.333
DEC Alpha 21264                     667     c89       4   1.000   0.989   -n/a-   -n/a-   -n/a-
DEC Alpha 21264                     667     c89       4  53.359  42.239   0.989   1.000   0.989
DEC Alpha 21264                     667     c89       8   1.000   1.010   -n/a-   -n/a-   -n/a-
DEC Alpha 21264                     667     c89       8  78.552  57.885   1.000   1.021   1.000
DEC Alpha 21264                     667     c89      16   0.986   1.014   -n/a-   -n/a-   -n/a-
DEC Alpha 21264                     667     c89      16   1.057   1.000   0.986   0.957   0.986
HP PA-RISC 1.1 7100LC                80      cc       4  12.058  12.178   1.000  92.251   1.000
HP PA-RISC 1.1 7100LC                80      cc       8  16.955  16.841   1.000  11.278   1.000
IBM PowerPC                         133      cc       4   0.981   1.000   0.981   0.991   0.981
IBM PowerPC                         133      cc       8   1.007   1.007   1.014   1.000   1.007
IBM PowerPC                         133      cc       8   1.014   1.014   1.014   1.007   1.000
IBM PowerPC                         166      cc       4   0.991   1.000   0.991   0.991   0.991
IBM PowerPC                         166      cc       8   1.014   1.020   1.020   1.014   1.000
IBM PowerPC                         166      cc      16   1.009   1.009   0.991   0.991   0.991
IBM PowerPC                         233     gcc       4   1.006   1.013   1.026   1.000   1.000
IBM PowerPC                         233     gcc       8   1.006   1.013   1.019   1.000   1.000
IBM PowerPC                         533      cc       4   1.009   1.009   1.018   1.000   1.009
IBM PowerPC                         533      cc       8   0.991   0.991   0.991   0.983   0.991
IBM PowerPC                         533      cc       8   0.991   1.000   1.000   0.991   1.000
Intel IA-64 (emulated on IA-32)     600     gcc       4   1.012   1.018   1.009   0.941   0.953
Intel IA-64 (emulated on IA-32)     600     gcc       8   1.015   1.009   0.994   0.915   0.921
Intel IA-64 (emulated on IA-32)     600     gcc       8   1.015   1.011   0.998   0.917   0.923
Intel Pentium II                    450      cc       4   5.967   3.383   3.367   3.333   3.083
Intel Pentium II                    450      cc       8   3.655   2.236   2.227   2.291   2.145
Intel Pentium II                    450      cc      12   0.984   1.000   1.000   2.129   2.000
Intel Pentium II (Klamath)          300      cc       4   5.982   3.390   3.373   3.302   3.035
Intel Pentium II (Klamath)          300      cc       8   5.824   3.249   3.213   3.301   3.036
Intel Pentium II (Klamath)          300      cc      12   1.000   0.999   2.468   2.659   2.467
Intel Pentium III                  1266     gcc       4   6.014   3.408   3.394   3.317   3.056
Intel Pentium III                  1266     gcc       8   5.852   3.268   3.232   3.317   3.056
Intel Pentium III                  1266     gcc      12   1.010   1.010   1.000   2.588   2.402
Intel Pentium III (Katmai)          600     gcc       4   6.266   3.538   3.545   3.490   3.224
Intel Pentium III (Katmai)          600     gcc       8   6.176   3.437   3.423   3.514   3.246
Intel Pentium III (Katmai)          600     gcc      12   1.036   1.018   2.518   2.491   2.321
MIPS R10000                         180     c89       4   0.991   1.000   1.000  27.596   0.991
MIPS R10000                         180     c89       8   0.991   0.991   1.000  27.254   1.000
MIPS R10000                         180     c89      16   1.134   1.134   1.127   0.606   0.606
MIPS R10000                         195     c89       4   1.010   1.010   1.000  26.346   1.000
MIPS R10000                         195     c89       8   1.010   1.010   1.000  26.798   1.000
MIPS R10000                         195     c89      16   1.113   1.113   1.120   0.624   0.632
MIPS R4400                          150     c89       4  25.635  25.912   1.081  22.858   0.993
MIPS R4400                          150     c89       8  26.074  26.007   1.074  22.107   1.013
MIPS R4400                          150     c89      16  31.128  10.701   1.137   8.493   0.531
MIPS R4400                          175     c89       4  27.354  27.562   1.054  24.492   0.977
MIPS R4400                          175     c89       8  27.902  27.826   1.045  23.977   0.962
MIPS R4400                          175     c89      16  33.945  11.522   1.132   9.495   0.533
MIPS R5000                          180     c89       4   1.062   1.076   1.055  26.090   1.055
MIPS R5000                          180     c89       4   1.076   1.076   1.069  31.472   1.083
MIPS R5000                          180     c89       8   1.047   1.068   1.054  24.223   1.061
MIPS R5000                          180     c89       8   1.054   1.068   1.047  24.439   1.054
MIPS R5000                          180     c89      16   1.206   1.198   1.222   0.532   0.540
MIPS R5000                          180     c89      16   1.222   1.198   1.230   0.532   0.540
Sun UltraSPARC                      400     c89       4  16.675   1.031   0.995   1.015   1.015
Sun UltraSPARC                      400     c89       8  14.527   1.015   1.053   1.008   1.015
Sun UltraSPARC                      400     c89      16   1.015   1.026   1.031   0.701   0.716
Sun UltraSPARC II                   167     c89       4  16.761   1.017   1.009   1.009   1.017
Sun UltraSPARC II                   167     c89       8  12.586   1.006   0.994   1.000   1.006
Sun UltraSPARC II                   167     c89      16   1.002   1.000   1.002   0.705   0.701
Sun UltraSPARC II                   270     c89       4  18.618   0.993   0.993   1.000   0.986
Sun UltraSPARC II                   270     c89       8  14.640   0.995   1.000   0.995   1.000
Sun UltraSPARC II                   270     c89      16   1.000   1.000   1.000   0.697   0.701
Sun UltraSPARC II                   300     c89       4  16.961   1.008   1.008   1.008   1.008
Sun UltraSPARC II                   300     c89       8  13.068   1.000   1.000   1.011   1.000
Sun UltraSPARC II                   300     c89      16   1.000   1.004   0.989   0.706   0.709
Sun UltraSPARC II                   400     c89       4  16.777   1.005   1.000   1.000   1.000
Sun UltraSPARC II                   400     c89       8  12.818   1.008   1.000   1.000   1.000
Sun UltraSPARC II                   400     c89      16   1.010   1.010   1.010   0.694   0.694
Sun UltraSPARC II                   440     c89       4  16.824   0.995   0.995   1.000   1.000
Sun UltraSPARC II                   440     c89       8  13.198   1.008   1.016   1.016   1.000
Sun UltraSPARC II                   440     c89      16   1.021   1.000   1.021   0.688   0.704
Sun UltraSPARC IIe                  500     c89       4  16.981   1.000   1.013   1.000   1.006
Sun UltraSPARC IIe                  500     c89       8  13.179   1.000   1.000   1.009   1.000
Sun UltraSPARC IIe                  500     c89      16   0.994   1.000   0.994   0.697   0.690
Sun UltraSPARC III                  750     c89       4  13.417   0.942   0.897   0.942   0.910
Sun UltraSPARC III                  750     c89       8  11.223   1.000   0.995   1.000   1.000
Sun UltraSPARC III                  750     c89      16   0.992   1.000   0.992   0.659   0.675
TI SuperSPARC Viking                 40     gcc       4   1.000   1.009   0.991   0.991   0.991
TI SuperSPARC Viking                 40     gcc       8   0.996   1.000   0.984   0.988   0.984
TI SuperSPARC Viking                 40     gcc       8   1.016   1.012   1.000   1.000   1.000
TI SuperSPARC Viking/MXCC            50     gcc       4   1.005   1.000   0.995   0.995   0.989
TI SuperSPARC Viking/MXCC            50     gcc       8   1.005   1.000   0.990   0.990   0.995
TI SuperSPARC Viking/MXCC            50     gcc       8   1.010   1.010   1.000   1.000   0.995
-----------------------------------------------------------------------------------------------

Relative timing: sorted by precision and CPU type

-----------------------------------------------------------------------------------------------
CPU                                 MHz Cmpiler fp_size   ufl->   ufl->   ofl->     NaN     Inf
                                                        subnorm    zero     Inf
-----------------------------------------------------------------------------------------------
AMD Athlon                         1400     gcc       4   4.974   3.376   1.000   1.000   0.991
DEC Alpha 21064 EV4                 100     gcc       4   1.001   1.007   -n/a-   -n/a-   -n/a-
DEC Alpha 21064 EV4                 100     gcc       4   8.252   8.139   8.261   7.532   7.517
DEC Alpha 21164 EV5                 466     c89       4   1.000   1.000   -n/a-   -n/a-   -n/a-
DEC Alpha 21164 EV5                 466     c89       4  43.636  34.727  21.879  21.121  21.439
DEC Alpha 21264                     667     c89       4   1.000   0.989   -n/a-   -n/a-   -n/a-
DEC Alpha 21264                     667     c89       4  53.359  42.239   0.989   1.000   0.989
HP PA-RISC 1.1 7100LC                80      cc       4  12.058  12.178   1.000  92.251   1.000
IBM PowerPC                         133      cc       4   0.981   1.000   0.981   0.991   0.981
IBM PowerPC                         166      cc       4   0.991   1.000   0.991   0.991   0.991
IBM PowerPC                         233     gcc       4   1.006   1.013   1.026   1.000   1.000
IBM PowerPC                         533      cc       4   1.009   1.009   1.018   1.000   1.009
Intel IA-64 (emulated on IA-32)     600     gcc       4   1.012   1.018   1.009   0.941   0.953
Intel Pentium II                    450      cc       4   5.967   3.383   3.367   3.333   3.083
Intel Pentium II (Klamath)          300      cc       4   5.982   3.390   3.373   3.302   3.035
Intel Pentium III                  1266     gcc       4   6.014   3.408   3.394   3.317   3.056
Intel Pentium III (Katmai)          600     gcc       4   6.266   3.538   3.545   3.490   3.224
MIPS R10000                         180     c89       4   0.991   1.000   1.000  27.596   0.991
MIPS R10000                         195     c89       4   1.010   1.010   1.000  26.346   1.000
MIPS R4400                          150     c89       4  25.635  25.912   1.081  22.858   0.993
MIPS R4400                          175     c89       4  27.354  27.562   1.054  24.492   0.977
MIPS R5000                          180     c89       4   1.062   1.076   1.055  26.090   1.055
MIPS R5000                          180     c89       4   1.076   1.076   1.069  31.472   1.083
Sun UltraSPARC                      400     c89       4  16.675   1.031   0.995   1.015   1.015
Sun UltraSPARC II                   167     c89       4  16.761   1.017   1.009   1.009   1.017
Sun UltraSPARC II                   270     c89       4  18.618   0.993   0.993   1.000   0.986
Sun UltraSPARC II                   300     c89       4  16.961   1.008   1.008   1.008   1.008
Sun UltraSPARC II                   400     c89       4  16.777   1.005   1.000   1.000   1.000
Sun UltraSPARC II                   440     c89       4  16.824   0.995   0.995   1.000   1.000
Sun UltraSPARC IIe                  500     c89       4  16.981   1.000   1.013   1.000   1.006
Sun UltraSPARC III                  750     c89       4  13.417   0.942   0.897   0.942   0.910
TI SuperSPARC Viking                 40     gcc       4   1.000   1.009   0.991   0.991   0.991
TI SuperSPARC Viking/MXCC            50     gcc       4   1.005   1.000   0.995   0.995   0.989
AMD Athlon                         1400     gcc       8   4.802   3.198   1.009   1.000   1.009
DEC Alpha 21064 EV4                 100     gcc       8   1.006   0.999   -n/a-   -n/a-   -n/a-
DEC Alpha 21064 EV4                 100     gcc       8   8.835   8.697   9.572   7.886   7.815
DEC Alpha 21164 EV5                 466     c89       8   1.000   1.000   -n/a-   -n/a-   -n/a-
DEC Alpha 21164 EV5                 466     c89       8  66.043  49.217  27.913  27.130  27.333
DEC Alpha 21264                     667     c89       8   1.000   1.010   -n/a-   -n/a-   -n/a-
DEC Alpha 21264                     667     c89       8  78.552  57.885   1.000   1.021   1.000
HP PA-RISC 1.1 7100LC                80      cc       8  16.955  16.841   1.000  11.278   1.000
IBM PowerPC                         133      cc       8   1.007   1.007   1.014   1.000   1.007
IBM PowerPC                         133      cc       8   1.014   1.014   1.014   1.007   1.000
IBM PowerPC                         166      cc       8   1.014   1.020   1.020   1.014   1.000
IBM PowerPC                         233     gcc       8   1.006   1.013   1.019   1.000   1.000
IBM PowerPC                         533      cc       8   0.991   0.991   0.991   0.983   0.991
IBM PowerPC                         533      cc       8   0.991   1.000   1.000   0.991   1.000
Intel IA-64 (emulated on IA-32)     600     gcc       8   1.015   1.009   0.994   0.915   0.921
Intel IA-64 (emulated on IA-32)     600     gcc       8   1.015   1.011   0.998   0.917   0.923
Intel Pentium II                    450      cc       8   3.655   2.236   2.227   2.291   2.145
Intel Pentium II (Klamath)          300      cc       8   5.824   3.249   3.213   3.301   3.036
Intel Pentium III                  1266     gcc       8   5.852   3.268   3.232   3.317   3.056
Intel Pentium III (Katmai)          600     gcc       8   6.176   3.437   3.423   3.514   3.246
MIPS R10000                         180     c89       8   0.991   0.991   1.000  27.254   1.000
MIPS R10000                         195     c89       8   1.010   1.010   1.000  26.798   1.000
MIPS R4400                          150     c89       8  26.074  26.007   1.074  22.107   1.013
MIPS R4400                          175     c89       8  27.902  27.826   1.045  23.977   0.962
MIPS R5000                          180     c89       8   1.047   1.068   1.054  24.223   1.061
MIPS R5000                          180     c89       8   1.054   1.068   1.047  24.439   1.054
Sun UltraSPARC                      400     c89       8  14.527   1.015   1.053   1.008   1.015
Sun UltraSPARC II                   167     c89       8  12.586   1.006   0.994   1.000   1.006
Sun UltraSPARC II                   270     c89       8  14.640   0.995   1.000   0.995   1.000
Sun UltraSPARC II                   300     c89       8  13.068   1.000   1.000   1.011   1.000
Sun UltraSPARC II                   400     c89       8  12.818   1.008   1.000   1.000   1.000
Sun UltraSPARC II                   440     c89       8  13.198   1.008   1.016   1.016   1.000
Sun UltraSPARC IIe                  500     c89       8  13.179   1.000   1.000   1.009   1.000
Sun UltraSPARC III                  750     c89       8  11.223   1.000   0.995   1.000   1.000
TI SuperSPARC Viking                 40     gcc       8   0.996   1.000   0.984   0.988   0.984
TI SuperSPARC Viking                 40     gcc       8   1.016   1.012   1.000   1.000   1.000
TI SuperSPARC Viking/MXCC            50     gcc       8   1.005   1.000   0.990   0.990   0.995
TI SuperSPARC Viking/MXCC            50     gcc       8   1.010   1.010   1.000   1.000   0.995
AMD Athlon                         1400     gcc      12   1.007   1.000   1.000   1.007   1.013
Intel Pentium II                    450      cc      12   0.984   1.000   1.000   2.129   2.000
Intel Pentium II (Klamath)          300      cc      12   1.000   0.999   2.468   2.659   2.467
Intel Pentium III                  1266     gcc      12   1.010   1.010   1.000   2.588   2.402
Intel Pentium III (Katmai)          600     gcc      12   1.036   1.018   2.518   2.491   2.321
DEC Alpha 21264                     667     c89      16   0.986   1.014   -n/a-   -n/a-   -n/a-
DEC Alpha 21264                     667     c89      16   1.057   1.000   0.986   0.957   0.986
IBM PowerPC                         166      cc      16   1.009   1.009   0.991   0.991   0.991
MIPS R10000                         180     c89      16   1.134   1.134   1.127   0.606   0.606
MIPS R10000                         195     c89      16   1.113   1.113   1.120   0.624   0.632
MIPS R4400                          150     c89      16  31.128  10.701   1.137   8.493   0.531
MIPS R4400                          175     c89      16  33.945  11.522   1.132   9.495   0.533
MIPS R5000                          180     c89      16   1.206   1.198   1.222   0.532   0.540
MIPS R5000                          180     c89      16   1.222   1.198   1.230   0.532   0.540
Sun UltraSPARC                      400     c89      16   1.015   1.026   1.031   0.701   0.716
Sun UltraSPARC II                   167     c89      16   1.002   1.000   1.002   0.705   0.701
Sun UltraSPARC II                   270     c89      16   1.000   1.000   1.000   0.697   0.701
Sun UltraSPARC II                   300     c89      16   1.000   1.004   0.989   0.706   0.709
Sun UltraSPARC II                   400     c89      16   1.010   1.010   1.010   0.694   0.694
Sun UltraSPARC II                   440     c89      16   1.021   1.000   1.021   0.688   0.704
Sun UltraSPARC IIe                  500     c89      16   0.994   1.000   0.994   0.697   0.690
Sun UltraSPARC III                  750     c89      16   0.992   1.000   0.992   0.659   0.675
-----------------------------------------------------------------------------------------------