September 1993/Standard C

Columns

Standard C

Floating-Point C Extensions

P.J. Plauger

P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, published by Prentice-Hall, and ANSI and ISO Standard C (with Jim Brodie), published by Microsoft Press. You can reach him at pjp@plauger.com.

Introduction
Last month, I described the operations of the Numerical C Extensions Group, also known as NCEG or ANSI X3J11.1. (See "Standard C: The Numerical C Extensions Group," CUJ August 1993.) Under the chairmanship of Rex Jaeschke, that committee has been working for several years on a number of extensions to C. Some of those extensions are now starting to appear in the form of Technical Reports. The first three to appear (still in draft but available for wider review) are:

TR Part 1: Designated Initializers and Compound Literals

TR Part 2: Aliasing Control via Restricted Pointers

TR Part 3: Floating-Point C Extensions
I remind you yet again that these documents do not have the force of a standard. The idea is to publish a recommended form for extensions that many people are inclined to make. That way, more organizations are likely to gain experience with the same syntax and semantics. When the time comes to revise the C Standard (and that time is rapidly approaching), there will be more common "prior art" to evaluate for each possible extension. And that makes it easier to determine whether an idea is basically good enough to make a formal part of Standard C.
I covered TR Part 1 and TR Part 2 in my last column. This month, I review the much more extensive TR Part 3. Floating-point arithmetic is a boundless pit of subtleties and complexities. Floating-Point C Extensions is a measured attempt to bring to heel a few more of those subtleties and complexities within the compass of the C programming language.

IEEE Floating-Point Standards
You probably know that detailed standards already exist for floating-point arithmetic. Since 1985, the world has had ANSI/IEEE 754-1985, IEEE Standard for Binary Floating-Point Arithmetic. This document spelled out formats for binary floating-point data in several precisions. It also described in exquisite detail what operations could be performed on them, how accurate the results had to be, and what funny codes to produce for all sorts of error conditions. In short, IEEE 754 gave us chapter and verse about how best to do floating-point arithmetic.
That Standard was not lovingly received in all circles. You have to understand that, up to that time, each hardware vendor traditionally defined a unique floating-point format. The accompanying arithmetic was equally idiosyncratic. Some vendors favored speed over precision, some favored simplicity of implementation above all else. In fairness, all floating-point implementations had improved over the years, as better techniques became well known. But vendors still had a stake in continuing to support their proprietary approaches.
You also have to understand that floating-point arithmetic takes a lot of microcode. In fact, it typically takes about as many kilobytes to describe the floating-point instructions as it does all the rest of the instruction set put together. (And that's why floating-point was often optional on minicomputers, or is still sold separately as a coprocessor with microcomputers.)
So along comes this Standard that is perhaps twice as complex as your typical floating-point architecture of yore. It obsoletes everything that has gone before and causes conversion problems in the bargain. You can see why not everyone jumped aboard the bandwagon.
A few important players did, however. Most notable were Intel, with their 80X87 line of floating-point coprocessors, and Motorola, with their MC 68881. They turned a paper document into very real, and affordable, hardware. And they forced the world to give serious consideration to this new, more conscientious, approach to computing in floating-point.
Today, all but the oldest architectures generally offer some variant of IEEE 754 floating-point arithmetic. I say variant in part because the Standard does offer some latitude in the choice and representation of data representations, and in the range of operations supported. But I also mean that not everybody slavishly implements all of the requirements of the Standard. (I certainly didn't, in the several software versions I put on the market in years past.) And some people try to match the Standard, but ship products with various errors and inaccuracies. Don't think you're in safe territory just because a vendor offers IEEE 754 compliance.
I should also mention ANSI/IEEE 854-1987, IEEE Standard for Radix-Independent Floating-Point Arithmetic. That document extends the floating-point model to bases other than 2. While in some ways it is now a more fundamental document than IEEE 754, it has so far had less practical effect.

What C Standardized
Both of these IEEE floating-point standards were in place well before we froze the C Standard. That obliged us to take them into consideration when we described floating-point arithmetic in C. ANSI takes a dim view of technical committees who develop conflicting standards. They don't like turf wars either. Even if two standards say exactly the same thing about a given topic, the potential for future diversion is very real.
We on committee X3J11 listened to several presentations on the existing floating-point standards. We also got knowledgeable input from several vendors of alternate floating-point formats. Tom MacDonald of Cray Research did quite a bit of work along these lines for the committee. In the end, I believe we did far more than we had to, and far less than we could have.
We cribbed the abstract description of floating-point formats from the FORTRAN Standard. That was written in the heyday of proprietary floating-point standards, so it was quite general. We agreed on various minima for precisions and exponent ranges, then expressed them in terms of the FORTRAN model. Somewhere along the way, we decided to add the third floating-point type, long double. We chose, however, not to require that it be represented any differently than double.
Tom MacDonald championed the view that sophisticated numerical programmers needed to know about the model parameters chosen by each implementation. Yes, there are programs that play clever games with a computer long enough to figure out these parameters. But it seemed perverse to require such antics to get answers so easily supplied by an implementation.
Thus was born the header <float.h>. It can be subtitled, "Everything you always wanted to know about your floating-point processor, but were afraid to ask." From personal experience, as both a compiler implementor and a numerical programmer, I'd add the trailer, "and then some." I have found uses for only a handful of the parameters we chose to put in that header. Still, I'm glad they're there.
As for IEEE 754, we listened carefully, but we didn't feel an obligation to follow the bandwagon. (IBM, DEC, and Cray would have screamed rather loudly if we'd tried.) We aimed for reasonable compatibility with the basic representations, but we didn't mandate all the error codes or extended capabilities in that standard. It is fairest to say that we made the C Standard tolerant of IEEE 754 peculiarities. And that was enough to make ANSI happy.

Language Extensions
Floating-Point C Extensions thus picks up where the C Standard left off. It addresses several perceived needs:

to give more C language support for floating-point operations in general

to give reasonably complete C language support (called a "C binding" these days) for IEEE 754 floating-point operations in particular
I do not intend to describe in detail what's in this Technical Report. It would take far more than one installment of this column. Worse, it would bore most of you to tears. My more limited goal is to give you an overview of a rather complex subtopic in Standard C. You can file that superficial knowledge away against future need. If, at some later date, you have occasion to delve more deeply into some aspect of floating-point arithmetic, you'll know roughly where to look.

Pragmas
Committee X3J11 borrowed the concept of pragmas from Algol 68 as a general way to extend the C language in nonstandard ways. In C, they take the form of preprocessing directives that begin with #pragma. Floating-Point C Extensions defines five pragmas. You write them only outside of external declarations. That means you can't change the rules partway through a function definition.
Four of the pragmas loosen the usual C rules about honoring the types declared by the programmer:

#pragma fp_wide_function_returns { on off } #pragma fp_wide_function_parameters { on off } #pragma fp_wide_variables { on off } #pragma fp_contract { on off }
To conform to the C Standard, the default in each case is off. You turn them on to, respectively:

permit functions to return a wider floating-point type than declared

permit functions to receive (and not narrow down) arguments with wider floating-point types than declared

permit objects to be represented as wider floating-point types than declared

permit the use of "contracted" instructions, such as combined multiply-add, even when the result is slightly different than when not contracted
Why would you want to do these things? Because modern floating-point processors often do all their serious work in an extended precision (usually what is chosen for long double in C). It takes extra work — and causes more loss of precision, overflow, and underflow — to keep chopping down floating-point results.
Sounds good. So why would you not want to do this all the time? Because some programs actually depend on truncation of precision to behave properly. (This is generally more true when narrowing integers than when narrowing floating-point values, however.) And some programs have large arrays of floating-point values. They depend on trading precision for storage space to work properly in limited storage. Thus the pragmas.
The fifth pragma is part of the IEEE 754 support:

#pragma fenv_access { on off default }
It determies whether a function may use the library functions (described below) to alter the floating-point operating mode. The default mode may be either on or off.

Additions to <float.h>
The Technical Report adds two macros to <float.h>:

#define_MIN_EVAL_FORMAT {0 1 2} #define_WIDEST_NEED_EVAL{0 1}
These reveal two implementation decisions, which are, respectively:

whether the narrowest kind of floating-point arithmetic is performed in float (0), double (1), or long double (2)

whether the kind of floating-point arithmetic for a subexpression is determined by the wider of the two operands (0), or by the widest kind that will be needed for the containing expression (1)
Presumably, a very smart, and critical, floating-point calculation may require different forms of expression, depending on what these do to #if expressions.

Floating-Point Constants
We all want our constants to be "folded" at translation time. That's compiler-writers' jargon for doing something once up front instead of every time an expression is evaluated. Some constant expressions must be evaluated at translation time, or at least prior to program startup. Static initializers are an obvious example. But the rest can be deferred to runtime, at least in principle.
The C Standard makes a few blanket statements about writing floating-point constants and folding constant expressions. The Technical Report goes into considerably more detail about what's expected. In particular:

constant folding that can occur at runtime, and that causes a floating-point exception, must result in an expression that reports the exception at runtime

any number of "identities" of mathematics cannot be used in constant folding, because they are not identities in the real world of floating-point arithmetic
Yet another real-world problem is representing certain floating-point constants exactly. Decimal to binary conversion is notoriously uncertain in this regard. So the Technical Report adds hexadecimal floating-point constants. For example,

#define FLT_MAX 0x. 7F7FFFFF
is a reliable way to write the largest (IEEE 754) value of type float. Of course, such forms are extremely implementation dependent, but at least they're exact.

New Relational Operators
One problem with comparing floating-point values is the funny codes. Sure, +INF is larger than any finite value, but what about a NaN? The honest answer is to say that comparisons with NaNs (or even between NaNs) are unordered. The implementor gets to choose between trapping unordered comparisons, which can be slow and complex, or bulling ahead with the wrong answer, which is fast and unsafe.
The Technical Report offers another alternative. It adds a slew of new relational operators, and redefines an existing one, to handle the unordered case properly inline. The basic idea is that a ! can be put in front of any of the relational operators that don't already have one. Its presence now usually means, "also true if unordered." Thus, != is redefined to mean, "unordered, less, or greater." Its old meaning is now handled by the new operator <>. There is also a new operator <>= which means, "less, equal, or greater" (believe it or not).
I won't list all the operators, or discuss them in detail, because I get a headache every time I think about them. I doubt that I'll ever have occasion to use such creatures, given library functions that perform the same tests more readably.

Predefined Macros
The last addition to the language proper is a pair of predefined macro names:

__FPCE__ is defined if this Technical Report is implemented.

__FPCE_IEEE__ is defined if IEEE 754 compatibility is also present.

Library Extensions
The library extensions are easier to describe (glibly, at least), but in many ways more extensive than the language changes. Only a few apply to functions declared in existing headers.
The functions in <stdlib.h> that convert from text to floating-point types have to be smarter. First, the old warhorse atof is no longer excused from proper error checking. Then there are new functions strtof, to convert directly to float, and strtold, to convert to long double. And then there are new requirements on all these functions plus the older workhorse strtod. All must now accept text strings such as INF and NAN for the funny codes.
Their companions in <stdio.h> that read such numbers are, of course, the scan functions — fscanf, etc. These get smarter too, since their behavior is defined in terms of strtod anyway.
Their opposite numbers are the print functions — fprintf, etc. They are now expected to print the same codes for infinities and NaNs. They also get three new conversion specifiers:

%a, to print in the new hexadecimal floating-point format

%A, the same as %a, only using upper case letters instead

%F, the same as %f, only using upper case letters instead
None of this should come as a surprise. It is simply a natural follow through, to complement the language additions.

The Header <fp.h>
Listing 1 shows the new header <fp.h>. It adds a variety of new types, macros, and functions:

float_t is the most efficient (computational) floating-point type at least as wide as float.

double_t is the most efficient (computational) floating-point type at least as wide as double.

HUGE_VAL is the same as in <math.h>. (It can be infinity.)

INFITITY is assuredly infinity.

NAN is a NaN.

FP_NAN, etc. describe the values that can be generated by the floating-point classification macros

(fpclassify, etc.) that follow.

DECIMAL_DIG is the largest number of decimal digits that can be converted between text and long double without loss of precision.
The thumping great list that follows is an extensive set of mathematical functions, far too numerous to detail here. What's really adventurous is the set of functions whose return and argument types contain the vague bold italic T. These are generic functions, just like in FORTRAN, and not quite like templates in C++. Each represents a set of functions overloaded on the (floating-point) type of its argument(s), By some unspecified mechanism, C is extended to handle such generics.

The Header <fenv.h>
Listing 2 shows the new header <fenv.h>. It adds new types, macros, and functions for controlling the state of an IEEE 754 processor:

fenv_t describes an object that can hold the entire floating-point environment.

fexcept_t describes an object that can hold all the exception flags.

FE_INEXACT, etc. describe the various exception flags.

FE_TONEAREST, etc. describe the various rounding modes.

FE_DFL_ENV points at an object of type fenv_t that holds the default floating-point environment.
The rest of this header declares the various functions provided for manipulating the floating-point environment in various ways.

Conclusion
That's the end of my rather brief overview of Floating-Point C Extensions. I confess that I find most of this stuff rather specialized, for my needs at least. On the other hand, it was developed over several years by people with good credentials, and intentions to match. I don't dismiss it lightly.
I end with one of my favorite quotes from R.W. Hamming. He said it of an early book of floating-point arithmetic, but it applies at least as well here: "Nobody should have to know that much about floating-point arithmetic, but I'm afraid sometimes you might."