Readers Run Hot and Cold

Dr. Dobb's Journal August 1997

The missing-temperature problem a couple of months back sparked a lot of reader response. The problem, briefly: I'm maintaining a temperature database. Every day (presumably at midnight), I record the high and low temperatures for the preceding 24 hours from a high/low thermometer, then reset the thermometer. Oops; I miss a day. Now at the next midnight reading, the thermometer is telling me the high and low for the preceding 2-day period rather than for a 1-day period. Can I use this data? What do I do with it? How much information have I lost?

Jeff Ferris wrote to remind me that a stopped clock is more accurate than a clock that is one minute slow because the stopped clock is correct twice a day, which, in a way, is a fine response, but not the one I expected. Nor was Doug Renner's, which I will paraphrase: No one ever having observed a past moment or a future moment, we are empirically justified in speaking only of a single moment: the present moment when every event takes place. Therefore all possible or actual temperature observations are equal. O-kay. Moving right along...

The answer I expected, that the information content of the data depends on its intended use, came from many readers, in many forms. Russell Bornsch and Stephen Greif pointed out that, if all I'm going to do with the data is compute, say, the yearly or monthly high and low temperatures, I haven't lost anything. (But suppose the two-day period is December 31-January 1?)

Bob Edgar demonstrated that, if what I really want to do is compute the average temperature spread over a two-day period (unlikely as that may be), then the two-day observation is perfectly valid data (although I have lost the data from two two-day intervals that overlap this one). But if I'm interested in anything involving relationships among readings, I'm in trouble.

Or the intended use could be forensic, with different standards of validity: As Anthony Castaldo pointed out, knowing the two-day low for yesterday and the day before won't necessarily convict or exonerate the launch crew of negligence in yesterday's launch. Yesterday's low might. Another wrinkle is that the actual values of the data can affect their accuracy. Bill Wade pointed out that, if it happens that the low and high temperatures for the two-day period are equal, then we have lost no information. In fact we have as much information as if we had been recording highs and lows every millisecond.

Okay, but let's say I don't know what I'm going to do with the data, but would like to preserve the information in this reading if possible. How can I deal with this flawed but potentially useful data? Michael Schuster suggested that I "interpolate the currently recorded maximum (and minimum) with the day before yesterday's and, later on, with tomorrow's, assign the result to 'today' and 'yesterday' and also assign some probability to these values." Sounds good, but that "assign some probabilities" is a little vague. Is there some formal way of attaching uncertainties to my data values? Turns out there is.

"Most out-of-the-box programming languages," Todd Lewis wrote, "don't allow you to deal with values that have associated uncertainties without coding special calculations all over the place every time you perform some operations on your data. But you should look at Evan Manning's article 'Uncertainty Propagation in C++' (C/C++ Users Journal, March 1996). He has developed a C++ class that he calls UDouble. It lets you do 'normal' math operations on uncertain values. It's a much better example of how classes, operator overloading, etc. are Neat Things than is the tired example of implementing complex numbers."

Cool, Todd. I'll take one; wrap it up. Oh, but I'm, uh, allergic to C++. Does that come in Java, by any chance?

--Michael Swaine