These test results were generated by running SpecialNumbers.exe. This program was designed to measure the worst case penalties for operations on NANs, infinities and denormal numbers, along with the costs for high precision divides and square roots, divide by zero, overflow, etc.

 

For more information see the overview article.

 

These results demonstrate the performance penalties incurred by various x86 compatible processors when handling 'special' numbers such as NANs, infinities, and denormals. The article that explains this is available here.

 

These are raw results that can be difficult to understand at first, so here's a quick guide. The test program was run on several machines and the results were pasted in, without any modification except inserting the type of CPU in parentheses in the heading and breaking the results into two sections. The results vary a bit from run to run - perhaps five percent - but not enough to be worrisome.

 

For each test the chart shows how many times that operation was performed. Fast operations were done additional times, to ensure sufficient accuracy.

 

The chart also shows how long that test took - most of the tests take about a tenth of a second. The timing is done using the processor's rdtsc instruction, which tells you how many clock cycles had elapsed, and the processor’s measured clock speed (which is displayed in the heading and has been verified in each case to be correct).

 

Then, based on the count and the time, the chart displays how many million operations per second (Mops/sec) could be performed. It also displays approximately how many cycles each operation takes.

 

Finally, the chart displays the 'slowdown'. This slowdown is not relative to another processor, it is relative to the first item in that block, for that same processor. So, when the chart says that "adding nan to nan" has a slowdown of 95.347 for the 166.0 MHz CPU, it means that adding of NANs runs more than 95 times slower than "adding nrm to nrm" - adding normal numbers to other normal numbers.

 

Note that the program used to generate these results is a benchmarking program, not a real app. It was designed to tease out performance problems and emphasize them as much as possible. Thus, it is unlikely that there is any real-world application where NANs would slow down performance 900 to 1. However, if a real-world application sees just 10% - or even 1% - of the slowdown that was found here, it could be pretty significant - and at least one guy did. Some of the tests shown at this page show about a 100:1 slowdown caused by NANs and infinities - compared to a P4 using SSE2 or an Athlon.

Special number test results

Analysis

All recent Intel x86 processors can do one add per clock cycle, but special numbers (in this case, infinities and nans) cause them some grief.

Pentium has 60-95 times penalty for adding special numbers.

Pentium III has 115-130 times penalty for adding special numbers.

Pentium IV has 850-930 times penalty for adding special numbers.

 

Pentium can do an fld/fst pair in 3.4 cycles.

Pentium can do an fld/fst pair in 2.0 cycles.

Pentium can do an fld/fst pair in 3.7 cycles.

 

Pentium has 20 times penalty for loading nans (Pentium only).

Pentium has 47 times penalty for loading denormals.

Pentium has 3 times penalty for loading across QWORD boundaries.

Pentium has 3 times penalty for loading across cache line boundaries.

Pentium has 3 times penalty for loading across page boundaries.

Pentium has 6 times penalty for saving across QWORD boundaries.

Pentium has 6 times penalty for saving across cache line boundaries.

Pentium has 16 times penalty for saving across page boundaries.

 

Pentium III has 110 times penalty for loading denormals.

Pentium III has 0.7 times!!! ‘penalty’ for loading across QWORD boundaries.

Pentium III has 5.5 times penalty for loading across cache line boundaries.

Pentium III has 47 times penalty for loading across page boundaries.

Pentium III has 0.7 times!!! ‘penalty’ for saving across QWORD boundaries.

Pentium III has 4.5 times penalty for saving across cache line boundaries.

Pentium III has 64 times penalty for saving across page boundaries.

 

Pentium IV has 380 times penalty for loading denormals.

Pentium IV has no penalty for loading across QWORD boundaries.

Pentium IV has 5.5 times penalty for loading across cache line boundaries.

Pentium IV has 20 times penalty for loading across page boundaries.

Pentium IV has no penalty for saving across QWORD boundaries.

Pentium IV has 22 times penalty for saving across cache line boundaries.

Pentium IV has 23 times penalty for saving across page boundaries.

 

Athlon processors do all of the tested floating point operations with regular numbers at the same speed per clock or slightly faster than the Intel chips. The Athlon chip has essential no penalties for dealing with denormals, infinities, NANs, and unaligned data. The one exception was in the load/store test where there is a significant penalty on denormals, probably for storing the denormal to memory. This penalty, while large, is smaller than the equivalent penalty on most of the Intel chips.

Testing on 166.0 MHz CPU (Pentium, no MMX)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

        Register to register adds

adding nrm to nrm    , 100000000,  0.66707,  149.909,    1.107,    1.000

adding nan to nan    ,   1000000,  0.63604,    1.572,  105.582,   95.347

adding inf to inf    ,   1000000,  0.45547,    2.196,   75.607,   68.278

adding inf to nan    ,   1000000,  0.43608,    2.293,   72.388,   65.371

adding nrm to inf    ,   1000000,  0.41178,    2.429,   68.355,   61.729

adding inf to nrm    ,   1000000,  0.45432,    2.201,   75.418,   68.107

adding den to nrm    , 100000000,  0.66594,  150.164,    1.105,    0.998

 

        Memory to register adds

adding nrm to nrm    ,  50000000,  0.90880,   55.017,    3.017,    1.000

adding den to nrm    ,   1000000,  0.44808,    2.232,   74.381,   24.652

 

        Loads and stores - type tests

Loading nrm          ,  50000000,  1.03000,   48.543,    3.420,    1.000

Loading nan          ,   5000000,  2.06494,    2.421,   68.556,   20.048

Loading inf          ,  50000000,  1.02932,   48.576,    3.417,    0.999

Loading den          ,   1000000,  0.96876,    1.032,  160.814,   47.027

 

        Loads and stores - alignment tests

quad aligned         ,  50000000,  1.03167,   48.465,    3.425,    1.000

dword aligned src    ,   5000000,  0.30293,   16.505,   10.057,    2.936

byte aligned src     ,   5000000,  0.30275,   16.515,   10.051,    2.935

cache unaligned src  ,   2000000,  0.12124,   16.497,   10.063,    2.938

page unaligned src   ,   2000000,  0.12111,   16.514,   10.052,    2.935

dword aligned dst    ,   5000000,  0.61809,    8.089,   20.521,    5.991

byte aligned dst     ,   5000000,  0.61789,    8.092,   20.514,    5.989

cache unaligned dst  ,   2000000,  0.24734,    8.086,   20.529,    5.994

page unaligned dst   ,   2000000,  0.67550,    2.961,   56.066,   16.369

Testing on 698.0 MHz CPU (Pentium III, laptop)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Register to register adds

adding nrm to nrm    , 100000000,  0.14869,  672.561,    1.038,    1.000

adding nan to nan    ,   1000000,  0.19101,    5.235,  133.325,  128.466

adding inf to inf    ,   1000000,  0.17423,    5.740,  121.609,  117.177

adding inf to nan    ,   1000000,  0.17603,    5.681,  122.867,  118.389

adding nrm to inf    ,   1000000,  0.17755,    5.632,  123.928,  119.411

adding inf to nrm    ,   1000000,  0.17402,    5.747,  121.464,  117.037

adding den to nrm    , 100000000,  0.14937,  669.498,    1.043,    1.005

 

       Memory to register adds

adding nrm to nrm    ,  50000000,  0.22451,  222.705,    3.134,    1.000

adding den to nrm    ,   1000000,  0.18501,    5.405,  129.140,   41.204

 

       Loads and stores - type tests

Loading nrm          ,  50000000,  0.15132,  330.418,    2.112,    1.000

Loading nan          ,   5000000,  0.01461,  342.331,    2.039,    0.965

Loading inf          ,  50000000,  0.14986,  333.645,    2.092,    0.990

Loading den          ,   1000000,  0.35793,    2.794,  249.837,  118.267

 

       Loads and stores - alignment tests

quad aligned         ,  50000000,  0.14971,  333.984,    2.090,    1.000

dword aligned src    ,  50000000,  0.14936,  334.769,    2.085,    0.998

byte aligned src     ,  50000000,  0.15080,  331.563,    2.105,    1.007

cache unaligned src  ,   2000000,  0.03420,   58.481,   11.936,    5.711

page unaligned src   ,   2000000,  0.27447,    7.287,   95.790,   45.834

dword aligned dst    ,  50000000,  0.14881,  335.993,    2.077,    0.994

byte aligned dst     ,  50000000,  0.14903,  335.511,    2.080,    0.995

cache unaligned dst  ,   2000000,  0.02654,   75.369,    9.261,    4.431

page unaligned dst   ,   2000000,  0.36247,    5.518,  126.503,   60.530

Testing on 997.0 MHz CPU (Pentium III, laptop)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Register to register adds

adding nrm to nrm    , 100000000,  0.10285,  972.310,    1.025,    1.000

adding nan to nan    ,   1000000,  0.13444,    7.438,  134.037,  130.718

adding inf to inf    ,   1000000,  0.12093,    8.269,  120.571,  117.585

adding inf to nan    ,   1000000,  0.12290,    8.137,  122.531,  119.496

adding nrm to inf    ,   1000000,  0.12043,    8.303,  120.070,  117.097

adding inf to nrm    ,   1000000,  0.12074,    8.282,  120.377,  117.396

adding den to nrm    , 100000000,  0.10302,  970.656,    1.027,    1.002

 

       Memory to register adds

adding nrm to nrm    ,  50000000,  0.15501,  322.564,    3.091,    1.000

adding den to nrm    ,   1000000,  0.12830,    7.795,  127.910,   41.383

 

       Loads and stores - type tests

Loading nrm          ,  50000000,  0.10488,  476.718,    2.091,    1.000

Loading nan          ,   5000000,  0.01040,  480.653,    2.074,    0.992

Loading inf          ,  50000000,  0.10354,  482.900,    2.065,    0.987

Loading den          ,   1000000,  0.25692,    3.892,  256.153,  122.480

 

       Loads and stores - alignment tests

quad aligned         ,  50000000,  0.10371,  482.099,    2.068,    1.000

dword aligned src    ,  50000000,  0.10307,  485.119,    2.055,    0.994

byte aligned src     ,  50000000,  0.10314,  484.771,    2.057,    0.994

cache unaligned src  ,   2000000,  0.02356,   84.894,   11.744,    5.679

page unaligned src   ,   2000000,  0.18921,   10.570,   94.324,   45.610

dword aligned dst    ,  50000000,  0.10500,  476.205,    2.094,    1.012

byte aligned dst     ,  50000000,  0.10338,  483.634,    2.061,    0.997

cache unaligned dst  ,   2000000,  0.01855,  107.794,    9.249,    4.472

page unaligned dst   ,   2000000,  0.25033,    7.990,  124.788,   60.341

Testing on 2800.0 MHz CPU (Pentium IV)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Register to register adds

adding nrm to nrm    , 100000000,  0.03609, 2770.968,    1.010,    1.000

adding nan to nan    ,   1000000,  0.33711,    2.966,  943.917,  934.130

adding inf to inf    ,   1000000,  0.30770,    3.250,  861.567,  852.634

adding inf to nan    ,   1000000,  0.33289,    3.004,  932.101,  922.436

adding nrm to inf    ,   1000000,  0.30551,    3.273,  855.422,  846.553

adding inf to nrm    ,   1000000,  0.30793,    3.247,  862.216,  853.276

adding den to nrm    , 100000000,  0.03611, 2769.486,    1.011,    1.001

 

       Memory to register adds

adding nrm to nrm    ,  50000000,  0.10056,  497.225,    5.631,    1.000

adding den to nrm    ,   1000000,  0.37543,    2.664, 1051.196,  186.672

 

       Loads and stores - type tests

Loading nrm          ,  50000000,  0.06476,  772.029,    3.627,    1.000

Loading nan          ,   5000000,  0.00643,  777.771,    3.600,    0.993

Loading inf          ,  50000000,  0.06490,  770.447,    3.634,    1.002

Loading den          ,   1000000,  0.50423,    1.983, 1411.835,  389.278

 

       Loads and stores - alignment tests

quad aligned         ,  50000000,  0.03951, 1265.602,    2.212,    1.000

dword aligned src    ,  50000000,  0.04021, 1243.500,    2.252,    1.018

byte aligned src     ,  50000000,  0.04508, 1109.027,    2.525,    1.141

cache unaligned src  ,   2000000,  0.01441,  138.763,   20.178,    9.121

page unaligned src   ,   2000000,  0.05470,   36.565,   76.577,   34.613

dword aligned dst    ,  50000000,  0.03964, 1261.332,    2.220,    1.003

byte aligned dst     ,  50000000,  0.03938, 1269.640,    2.205,    0.997

cache unaligned dst  ,   2000000,  0.05760,   34.725,   80.634,   36.447

page unaligned dst   ,   2000000,  0.06204,   32.236,   86.860,   39.261

Testing on 1733.0 MHz CPU (Athlon 2100)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Register to register adds

adding nrm to nrm    , 100000000,  0.06112, 1636.098,    1.059,    1.000

adding nan to nan    ,   1000000,  0.00058, 1732.889,    1.000,    0.944

adding inf to inf    ,   1000000,  0.00058, 1732.842,    1.000,    0.944

adding inf to nan    ,   1000000,  0.00058, 1732.842,    1.000,    0.944

adding nrm to inf    ,   1000000,  0.00058, 1718.254,    1.009,    0.952

adding inf to nrm    ,   1000000,  0.00058, 1732.863,    1.000,    0.944

adding den to nrm    , 100000000,  0.09620, 1039.453,    1.667,    1.574

 

       Memory to register adds

adding nrm to nrm    ,  50000000,  0.11755,  425.341,    4.074,    1.000

adding den to nrm    ,   1000000,  0.00274,  365.608,    4.740,    1.163

 

       Loads and stores - type tests

Loading nrm          ,  50000000,  0.06750,  740.785,    2.339,    1.000

Loading nan          ,   5000000,  0.00693,  721.740,    2.401,    1.026

Loading inf          ,  50000000,  0.06797,  735.599,    2.356,    1.007

Loading den          ,   1000000,  0.09880,   10.122,  171.213,   73.186

 

       Loads and stores - alignment tests

quad aligned         ,  50000000,  0.06933,  721.147,    2.403,    1.000

dword aligned src    ,   5000000,  0.00656,  762.122,    2.274,    0.946

byte aligned src     ,   5000000,  0.00612,  816.519,    2.122,    0.883

cache unaligned src  ,   2000000,  0.00277,  722.040,    2.400,    0.999

page unaligned src   ,   2000000,  0.00463,  432.322,    4.009,    1.668

dword aligned dst    ,   5000000,  0.00817,  611.628,    2.833,    1.179

byte aligned dst     ,   5000000,  0.00812,  615.461,    2.816,    1.172

cache unaligned dst  ,   2000000,  0.00332,  602.360,    2.877,    1.197

page unaligned dst   ,   2000000,  0.00368,  543.530,    3.188,    1.327

Testing on 1694.0 MHz CPU (Pentium 4)

       SSE2 adds

adding nrm to nrm    , 100000000,  2.57211,   38.879,   43.572,    1.000

adding nan to nan    ,  10000000,  0.25681,   38.939,   43.504,    0.998

adding inf to inf    ,  10000000,  0.25628,   39.021,   43.413,    0.996

adding inf to nan    ,  10000000,  0.25656,   38.977,   43.461,    0.997

adding nrm to inf    ,  10000000,  0.25836,   38.706,   43.766,    1.004

adding inf to nrm    ,  10000000,  0.25733,   38.861,   43.591,    1.000

adding den to nrm    ,  10000000,  7.42158,    1.347, 1257.216,   28.854

General math test results

Analysis

More recent Intel processors are faster at square roots

Athlon processors are faster at divide and square root than Intel processors

Operations that produce overflows run more slowly than those that don’t

Dividing by a power of two is faster than dividing by other numbers

Divide and square root generally run faster if the processor is set to lower precision

Testing on 166.0 MHz CPU (Pentium, no MMX)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Divide tests - float precision

dividing nrm by nrm  ,  10000000,  1.75820,    5.688,   29.186,    1.000

dividing nrm by 16   ,  10000000,  1.75492,    5.698,   29.132,    0.998

dividing nrm by zero ,   1000000,  0.58146,    1.720,   96.523,    3.307

 

       Divide tests - double precision

dividing nrm by nrm  ,  10000000,  2.60363,    3.841,   43.220,    1.481

dividing nrm by 16   ,  10000000,  2.78655,    3.589,   46.257,    1.585

dividing nrm by zero ,   1000000,  0.57505,    1.739,   95.459,    3.271

 

       Divide tests - extended precision

dividing nrm by nrm  ,  10000000,  2.96461,    3.373,   49.213,    1.686

dividing nrm by 16   ,  10000000,  2.96476,    3.373,   49.215,    1.686

dividing nrm by zero ,   1000000,  0.57494,    1.739,   95.440,    3.270

 

       Multiply tests - float precision

multiply nrm by nrm  ,  10000000,  0.78636,   12.717,   13.054,    1.000

multiply nrm by 16   ,  10000000,  0.96808,   10.330,   16.070,    1.231

multiply to overflow ,   1000000,  0.62912,    1.590,  104.434,    8.000

multiply to underflow,   1000000,  0.64137,    1.559,  106.467,    8.156

 

       Multiply tests - double precision

multiply nrm by nrm  ,  10000000,  0.78666,   12.712,   13.059,    1.000

multiply nrm by 16   ,  10000000,  0.96816,   10.329,   16.071,    1.231

multiply to overflow ,   1000000,  0.62944,    1.589,  104.486,    8.004

multiply to underflow,   1000000,  0.64139,    1.559,  106.470,    8.156

 

       Multiply tests - extended precision

multiply nrm by nrm  ,  10000000,  0.78643,   12.716,   13.055,    1.000

multiply nrm by 16   ,  10000000,  0.96804,   10.330,   16.070,    1.231

multiply to overflow ,   1000000,  0.63159,    1.583,  104.845,    8.032

multiply to underflow,   1000000,  0.64139,    1.559,  106.472,    8.157

 

       Square root tests - float precision

sqrt nrm             ,   1000000,  0.52406,    1.908,   86.994,    1.000

sqrt 4               ,   5000000,  2.42029,    2.066,   80.354,    0.924

sqrt 2               ,   1000000,  0.48393,    2.066,   80.332,    0.923

sqrt 9               ,   1000000,  0.51420,    1.945,   85.358,    0.981

sqrt negative        ,   1000000,  0.85300,    1.172,  141.597,    1.628

 

       Square root tests - double precision

sqrt nrm             ,   1000000,  0.48426,    2.065,   80.388,    0.924

sqrt 4               ,   5000000,  2.42105,    2.065,   80.379,    0.924

sqrt 2               ,   1000000,  0.48422,    2.065,   80.381,    0.924

sqrt 9               ,   1000000,  0.48372,    2.067,   80.298,    0.923

sqrt negative        ,   1000000,  0.86213,    1.160,  143.114,    1.645

 

       Square root tests - extended precision

sqrt nrm             ,   1000000,  0.52535,    1.904,   87.208,    1.002

sqrt 4               ,   5000000,  2.42056,    2.066,   80.362,    0.924

sqrt 2               ,   1000000,  0.48390,    2.067,   80.327,    0.923

sqrt 9               ,   1000000,  0.48396,    2.066,   80.338,    0.923

sqrt negative        ,   1000000,  0.85405,    1.171,  141.772,    1.630

Testing on 698.0 MHz CPU (Pentium III, laptop)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Divide tests - float precision

dividing nrm by nrm  ,  10000000,  0.28900,   34.603,   20.172,    1.000

dividing nrm by 16   ,  10000000,  0.13355,   74.880,    9.322,    0.462

dividing nrm by zero ,   1000000,  0.21059,    4.749,  146.990,    7.287

 

       Divide tests - double precision

dividing nrm by nrm  ,  10000000,  0.49940,   20.024,   34.858,    1.728

dividing nrm by 16   ,  10000000,  0.13491,   74.122,    9.417,    0.467

dividing nrm by zero ,   1000000,  0.20092,    4.977,  140.245,    6.952

 

       Divide tests - extended precision

dividing nrm by nrm  ,  10000000,  0.58864,   16.988,   41.087,    2.037

dividing nrm by 16   ,  10000000,  0.13394,   74.662,    9.349,    0.463

dividing nrm by zero ,   1000000,  0.20140,    4.965,  140.575,    6.969

 

       Multiply tests - float precision

multiply nrm by nrm  ,  10000000,  0.13447,   74.368,    9.386,    1.000

multiply nrm by 16   ,  10000000,  0.13665,   73.179,    9.538,    1.016

multiply to overflow ,   1000000,  0.19417,    5.150,  135.530,   14.440

multiply to underflow,   1000000,  0.20847,    4.797,  145.510,   15.503

 

       Multiply tests - double precision

multiply nrm by nrm  ,  10000000,  0.13386,   74.706,    9.343,    0.995

multiply nrm by 16   ,  10000000,  0.13445,   74.378,    9.385,    1.000

multiply to overflow ,   1000000,  0.19370,    5.163,  135.204,   14.405

multiply to underflow,   1000000,  0.21135,    4.732,  147.519,   15.717

 

       Multiply tests - extended precision

multiply nrm by nrm  ,  10000000,  0.13364,   74.828,    9.328,    0.994

multiply nrm by 16   ,  10000000,  0.13431,   74.455,    9.375,    0.999

multiply to overflow ,   1000000,  0.19341,    5.170,  135.002,   14.384

multiply to underflow,   1000000,  0.20898,    4.785,  145.870,   15.542

 

       Square root tests - float precision

sqrt nrm             ,   1000000,  0.04441,   22.516,   31.001,    1.000

sqrt 4               ,   5000000,  0.06707,   74.546,    9.363,    0.302

sqrt 2               ,   1000000,  0.04503,   22.210,   31.428,    1.014

sqrt 9               ,   1000000,  0.04557,   21.945,   31.807,    1.026

sqrt negative        ,   1000000,  0.17409,    5.744,  121.512,    3.920

 

       Square root tests - double precision

sqrt nrm             ,   1000000,  0.08880,   11.262,   61.980,    1.999

sqrt 4               ,   5000000,  0.06703,   74.589,    9.358,    0.302

sqrt 2               ,   1000000,  0.09143,   10.937,   63.818,    2.059

sqrt 9               ,   1000000,  0.08826,   11.331,   61.603,    1.987

sqrt negative        ,   1000000,  0.17327,    5.771,  120.940,    3.901

 

       Square root tests - extended precision

sqrt nrm             ,   1000000,  0.10371,    9.642,   72.392,    2.335

sqrt 4               ,   5000000,  0.06699,   74.638,    9.352,    0.302

sqrt 2               ,   1000000,  0.10674,    9.368,   74.508,    2.403

sqrt 9               ,   1000000,  0.10395,    9.620,   72.557,    2.341

sqrt negative        ,   1000000,  0.17500,    5.714,  122.153,    3.940

Testing on 997.0 MHz CPU (Pentium III, laptop)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Divide tests - float precision

dividing nrm by nrm  ,  10000000,  0.20179,   49.556,   20.119,    1.000

dividing nrm by 16   ,  10000000,  0.09307,  107.451,    9.279,    0.461

dividing nrm by zero ,   1000000,  0.14032,    7.126,  139.902,    6.954

 

       Divide tests - double precision

dividing nrm by nrm  ,  10000000,  0.35426,   28.228,   35.320,    1.756

dividing nrm by 16   ,  10000000,  0.09311,  107.404,    9.283,    0.461

dividing nrm by zero ,   1000000,  0.13949,    7.169,  139.072,    6.913

 

       Divide tests - extended precision

dividing nrm by nrm  ,  10000000,  0.41130,   24.313,   41.007,    2.038

dividing nrm by 16   ,  10000000,  0.09298,  107.550,    9.270,    0.461

dividing nrm by zero ,   1000000,  0.13949,    7.169,  139.069,    6.912

 

       Multiply tests - float precision

multiply nrm by nrm  ,  10000000,  0.09569,  104.500,    9.541,    1.000

multiply nrm by 16   ,  10000000,  0.09349,  106.967,    9.321,    0.977

multiply to overflow ,   1000000,  0.13649,    7.327,  136.081,   14.263

multiply to underflow,   1000000,  0.14458,    6.917,  144.147,   15.109

 

       Multiply tests - double precision

multiply nrm by nrm  ,  10000000,  0.09462,  105.687,    9.434,    0.989

multiply nrm by 16   ,  10000000,  0.09300,  107.531,    9.272,    0.972

multiply to overflow ,   1000000,  0.13560,    7.375,  135.192,   14.170

multiply to underflow,   1000000,  0.14506,    6.893,  144.629,   15.159

 

       Multiply tests - extended precision

multiply nrm by nrm  ,  10000000,  0.09281,  107.752,    9.253,    0.970

multiply nrm by 16   ,  10000000,  0.09314,  107.370,    9.286,    0.973

multiply to overflow ,   1000000,  0.13438,    7.442,  133.976,   14.043

multiply to underflow,   1000000,  0.14421,    6.934,  143.778,   15.070

 

       Square root tests - float precision

sqrt nrm             ,   1000000,  0.03167,   31.573,   31.577,    1.000

sqrt 4               ,   5000000,  0.04628,  108.043,    9.228,    0.292

sqrt 2               ,   1000000,  0.03141,   31.840,   31.313,    0.992

sqrt 9               ,   1000000,  0.03320,   30.117,   33.104,    1.048

sqrt negative        ,   1000000,  0.12069,    8.286,  120.326,    3.811

 

       Square root tests - double precision

sqrt nrm             ,   1000000,  0.06136,   16.297,   61.178,    1.937

sqrt 4               ,   5000000,  0.04666,  107.160,    9.304,    0.295

sqrt 2               ,   1000000,  0.06083,   16.440,   60.644,    1.920

sqrt 9               ,   1000000,  0.06086,   16.432,   60.674,    1.921

sqrt negative        ,   1000000,  0.11991,    8.339,  119.554,    3.786

 

       Square root tests - extended precision

sqrt nrm             ,   1000000,  0.07247,   13.799,   72.250,    2.288

sqrt 4               ,   5000000,  0.04657,  107.368,    9.286,    0.294

sqrt 2               ,   1000000,  0.07202,   13.884,   71.809,    2.274

sqrt 9               ,   1000000,  0.07241,   13.810,   72.195,    2.286

sqrt negative        ,   1000000,  0.11991,    8.340,  119.548,    3.786

Testing on 2800.0 MHz CPU (Pentium IV)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Divide tests - float precision

dividing nrm by nrm  ,  10000000,  0.08834,  113.205,   24.734,    1.000

dividing nrm by 16   ,  10000000,  0.08382,  119.309,   23.468,    0.949

dividing nrm by zero ,   1000000,  0.34291,    2.916,  960.147,   38.819

 

       Divide tests - double precision

dividing nrm by nrm  ,  10000000,  0.13623,   73.408,   38.143,    1.542

dividing nrm by 16   ,  10000000,  0.13644,   73.291,   38.204,    1.545

dividing nrm by zero ,   1000000,  0.35332,    2.830,  989.303,   39.998

 

       Divide tests - extended precision

dividing nrm by nrm  ,  10000000,  0.15428,   64.816,   43.199,    1.747

dividing nrm by 16   ,  10000000,  0.15457,   64.694,   43.280,    1.750

dividing nrm by zero ,   1000000,  0.34648,    2.886,  970.149,   39.223

 

       Multiply tests - float precision

multiply nrm by nrm  ,  10000000,  0.03588,  278.672,   10.048,    1.000

multiply nrm by 16   ,  10000000,  0.04919,  203.279,   13.774,    1.371

multiply to overflow ,   1000000,  0.29624,    3.376,  829.474,   82.554

multiply to underflow,   1000000,  0.29573,    3.381,  828.041,   82.411

 

       Multiply tests - double precision

multiply nrm by nrm  ,  10000000,  0.03591,  278.452,   10.056,    1.001

multiply nrm by 16   ,  10000000,  0.03652,  273.837,   10.225,    1.018

multiply to overflow ,   1000000,  0.29066,    3.441,  813.835,   80.998

multiply to underflow,   1000000,  0.30558,    3.272,  855.638,   85.158

 

       Multiply tests - extended precision

multiply nrm by nrm  ,  10000000,  0.03582,  279.154,   10.030,    0.998

multiply nrm by 16   ,  10000000,  0.03595,  278.150,   10.066,    1.002

multiply to overflow ,   1000000,  0.29020,    3.446,  812.546,   80.869

multiply to underflow,   1000000,  0.29553,    3.384,  827.486,   82.356

 

       Square root tests - float precision

sqrt nrm             ,   1000000,  0.00822,  121.624,   23.022,    1.000

sqrt 4               ,   5000000,  0.05234,   95.537,   29.308,    1.273

sqrt 2               ,   1000000,  0.00821,  121.739,   23.000,    0.999

sqrt 9               ,   1000000,  0.00826,  121.033,   23.134,    1.005

sqrt negative        ,   1000000,  0.34335,    2.912,  961.388,   41.760

 

       Square root tests - double precision

sqrt nrm             ,   1000000,  0.01366,   73.196,   38.253,    1.662

sqrt 4               ,   5000000,  0.06810,   73.422,   38.136,    1.657

sqrt 2               ,   1000000,  0.01365,   73.278,   38.211,    1.660

sqrt 9               ,   1000000,  0.01361,   73.494,   38.098,    1.655

sqrt negative        ,   1000000,  0.33747,    2.963,  944.906,   41.044

 

       Square root tests - extended precision

sqrt nrm             ,   1000000,  0.01542,   64.831,   43.189,    1.876

sqrt 4               ,   5000000,  0.07723,   64.744,   43.247,    1.879

sqrt 2               ,   1000000,  0.01539,   64.959,   43.104,    1.872

sqrt 9               ,   1000000,  0.01540,   64.924,   43.127,    1.873

sqrt negative        ,   1000000,  0.34904,    2.865,  977.309,   42.452

Testing on 1733.0 MHz CPU (Athlon 2100)

Identifier               count      time    Mops/sec  Cycles/op  Slowdown

 

       Divide tests - float precision

dividing nrm by nrm  ,  10000000,  0.07605,  131.493,   13.179,    1.000

dividing nrm by 16   ,  10000000,  0.04638,  215.617,    8.037,    0.610

dividing nrm by zero ,   1000000,  0.00462,  216.623,    8.000,    0.607

 

       Divide tests - double precision

dividing nrm by nrm  ,  10000000,  0.09926,  100.742,   17.202,    1.305

dividing nrm by 16   ,  10000000,  0.04665,  214.363,    8.084,    0.613

dividing nrm by zero ,   1000000,  0.00465,  215.169,    8.054,    0.611

 

       Divide tests - extended precision

dividing nrm by nrm  ,  10000000,  0.12411,   80.575,   21.508,    1.632

dividing nrm by 16   ,  10000000,  0.04630,  215.982,    8.024,    0.609

dividing nrm by zero ,   1000000,  0.00497,  201.092,    8.618,    0.654

 

       Multiply tests - float precision

multiply nrm by nrm  ,  10000000,  0.04117,  242.877,    7.135,    1.000

multiply nrm by 16   ,  10000000,  0.04053,  246.708,    7.024,    0.984

multiply to overflow ,   1000000,  0.00404,  247.567,    7.000,    0.981

multiply to underflow,   1000000,  0.06858,   14.581,  118.851,   16.657

 

       Multiply tests - double precision

multiply nrm by nrm  ,  10000000,  0.04048,  247.027,    7.015,    0.983

multiply nrm by 16   ,  10000000,  0.04111,  243.231,    7.125,    0.999

multiply to overflow ,   1000000,  0.00404,  247.567,    7.000,    0.981

multiply to underflow,   1000000,  0.06873,   14.550,  119.108,   16.693

 

       Multiply tests - extended precision

multiply nrm by nrm  ,  10000000,  0.04270,  234.208,    7.399,    1.037

multiply nrm by 16   ,  10000000,  0.04117,  242.908,    7.134,    1.000

multiply to overflow ,   1000000,  0.00405,  246.920,    7.018,    0.984

multiply to underflow,   1000000,  0.06965,   14.358,  120.699,   16.916

 

       Square root tests - float precision

sqrt nrm             ,   1000000,  0.00923,  108.312,   16.000,    1.000

sqrt 4               ,   5000000,  0.04634,  107.896,   16.062,    1.004

sqrt 2               ,   1000000,  0.00927,  107.863,   16.067,    1.004

sqrt 9               ,   1000000,  0.00923,  108.312,   16.000,    1.000

sqrt negative        ,   1000000,  0.00934,  107.042,   16.190,    1.012

 

       Square root tests - double precision

sqrt nrm             ,   1000000,  0.01408,   71.036,   24.396,    1.525

sqrt 4               ,   5000000,  0.06941,   72.036,   24.057,    1.504

sqrt 2               ,   1000000,  0.01389,   72.008,   24.067,    1.504

sqrt 9               ,   1000000,  0.01394,   71.747,   24.154,    1.510

sqrt negative        ,   1000000,  0.01436,   69.646,   24.883,    1.555

 

       Square root tests - extended precision

sqrt nrm             ,   1000000,  0.01864,   53.651,   32.301,    2.019

sqrt 4               ,   5000000,  0.09368,   53.372,   32.470,    2.029

sqrt 2               ,   1000000,  0.01873,   53.394,   32.457,    2.029

sqrt 9               ,   1000000,  0.02084,   47.981,   36.119,    2.257

sqrt negative        ,   1000000,  0.01879,   53.206,   32.572,    2.036

Testing on 1694.0 MHz CPU (Pentium 4)

       Multiply tests - SSE2

multiply nrm by nrm  ,  10000000,  0.25227,   39.640,   42.734,    4.265

multiply nrm by 16   ,  10000000,  0.26231,   38.122,   44.436,    4.435

multiply to overflow ,   1000000,  0.02533,   39.479,   42.909,    4.282

multiply to underflow,   1000000,  0.77311,    1.293, 1309.654,  130.699