Research and Advances

A statistical study of the accuracy of floating point number systems

This paper presents the statistical results of tests of the accuracy of certain arithmetic systems in evaluating sums, products and inner products, and analytic error estimates for some of the computations. The arithmetic systems studied are 6-digit hexadecimal and 22-digit binary floating point number representations combined with the usual chop and round modes of arithmetic with various numbers of guard digits, and with a modified round mode with guard digits. In a certain sense, arithmetic systems differing only in their use of binary or hexadecimal number representations are shown to be approximately statistically equivalent in accuracy. Further, the usual round mode with guard digits is shown to be statistically superior in accuracy to the usual chop mode in all cases save one. The modified round mode is found to be superior to the chop mode in all cases.

Advertisement

Author Archives

Research and Advances

A statistical study of the accuracy of floating point number systems

This paper presents the statistical results of tests of the accuracy of certain arithmetic systems in evaluating sums, products and inner products, and analytic error estimates for some of the computations. The arithmetic systems studied are 6-digit hexadecimal and 22-digit binary floating point number representations combined with the usual chop and round modes of arithmetic with various numbers of guard digits, and with a modified round mode with guard digits. In a certain sense, arithmetic systems differing only in their use of binary or hexadecimal number representations are shown to be approximately statistically equivalent in accuracy. Further, the usual round mode with guard digits is shown to be statistically superior in accuracy to the usual chop mode in all cases save one. The modified round mode is found to be superior to the chop mode in all cases.
Research and Advances

A note on computing approximations to the exponential function

Two methods are discussed which result in near minimax rational approximations to the exponential function and at the same time retain the desirable property that the approximation for negative values of the argument is the reciprocal of the approximation for corresponding positive values. These methods lead to approximations which are much superior to the commonly used convergents of the Gaussian continued fraction for the exponential. Coefficients and errors are given for the intervals [-1/2 ln 2, 1/2 ln 2] and [-ln 2, ln 2]. Two methods are discussed which result in near minimax rational approximations to the exponential function and at the same time retain the desirable property that the approximation for negative values of the argument is the reciprocal of the approximation for corresponding positive values. These methods lead to approximations which are much superior to the commonly used convergents of the Gaussian continued fraction for the exponential. Coefficients and errors are given for the intervals [-1/2 ln 2, 1/2 ln 2] and [-ln 2, ln 2].
Research and Advances

Double-precision square root for the CDC-3600

In January of 1960, the late Hans J. Maehly completed a summary of approximations to the elementary functions for the CDC-1604 computer. The approximations and techniques suggested by Maehly are equally applicable to the second large computer in the CDC line, the 3600. Unlike the 1604, however, the 3600 has built-in double-precision floating-point arithmetic. The present work, largely inspired by the successes of Maehly and his associates, concerns the extension of one of Maehly's ideas to a double-precision subroutine for the 3600.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved