November 1992/Q & A

Columns

Q & A

Generating Check Digits for Error Detection

Ken Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).
Q
I have a question that I hope you can answer or suggest where to find the answers. How are check digits computed on a field (such as a Social Security number)? Also, I've heard from friends and associates that Social Security numbers already have some sort of coding built-in. They say that there are everything from check digits to codes indicating what area of the country a person is born in. I've never read anything in black and white that would indicate this, and these programmers invariably "heard it from someone else." Is there anything to these rumors?
Thanks for taking the time to help and for doing a great job with this column.
Brendan O'Haire
Cleveland, Ohio
A
Let me start with the question on check digits. A check digit (or digits) is appended to a number to provide a form of error detection and/or correction. An easy example is a check digit which is the sum of the digits of the original number modulo ten (the remainder when the sum is divided by ten). For example, with the number 12345, the sum is 15 and the modulus 5. So the number would be written as 12345-5. If an error occurred in transcribing a single digit, say 12349-5, then the check digit will show that an error occurred.
There are more elaborate formulas for creating this check digit. These formulas permit detection of transposition errors (e.g. 13245-5), which this easy example cannot detect. I cannot find my reference for this formula. (It was over twenty years ago when I was working with this problem.) Perhaps a reader who has more recent usage will have the formula at his or her fingertips.
There are several books on error detecting/correcting codes, of which the check digit is an example. EC codes permit a single-digit error to be corrected and/or multiple-digit errors to be detected. One rule of thumb in selecting error correcting codes is to require the least number of additional check digits for the most error correction.
Your Social Security question is intriguing. Oral tradition has distinct advantages over written history. If the information is wrong, one cannot determine who the original source was. When working with Social Security numbers a while back (make that a long while), I heard that one set of numbers (the first three digits, I think) designate either your birthplace or the place you obtained the card. I am not sure that is still the case, if it ever was in the first place.
When the government started requiring S.S. numbers for every dependent that receives income, that dramatically increased the size of the population that needed numbers. Limiting the possibilities by reserving the first digits cuts down on the total count of issuable numbers. The situation is similar to the area code problem that the telephone companies are now facing.
As to a check digit in the number, I consider that a mathematical impossibility. There are only nine digits in a S.S. number (123-45-6789). That permits 1 billion combinations. If one digit was used as a check digit, there would only be 100 million combinations. I would estimate that at least half of the 250 million people in the U.S. have an S.S. number, so at least 125 million combinations are needed.
To avoid some error, the Social Security Administration might only issue combinations whose digits can be manipulated in some way to meet a specific rule. For example, the sum of the digits might have to be an even number (like even parity in communications). However the requirement that at least one quarter of the potential combinations be available does not allow much error detection. For an error in any single digit, there are one or two replacements which could be equally valid.
For example, if 123-45-6789 were a valid number and an error was made in the first digit (1), at least one or two of the set: 223-, 323-, 432-, 523-, 623-, 723-, 823-, 923-, 023- also begin a valid S.S. number with the same following digits. Otherwise you wouldn't have enough possibilities.
Before I leave the subject of Social Security numbers, I would like to warn you of the abuse of the S.S. number. Since you are a computer programmers, you can imagine the potential. I highly recommend not giving your number to anyone who does not require it by law. Federal agencies such as the State Department (for passports) and the FAA (for pilots licenses) desire but cannot demand your S.S. number. I recently went to a doctor whose receptionist asked for my Social Security number. I asked why. She said they use the last four digits to file away my medical records. She explained that this keeps the folder more evenly distributed than if they are filed by name. I told her to make one up for me.

Screen Fonts
Q
In a recent CUJ article (two issues back, I believe) a question was asked about generating screen fonts. I have been looking into this myself since I found Chet in a News group.
I have since found all the information needed to generate my own fonts but have been unable to find out how to create the bp portion of the required address. I have both Borland C++ v3.0 and MSC v6.00 and use them both for various projects depending on the need.
After reading your response to that user, I took a grep through the .h files in both include directories and found bp in the Borland C header, dos.h.
Borland C++ has a structure called REGPACK which contains everything I need (along with the function intr) to generate screen fonts using the code you provided. Listing 1 contains the code I now use to generate those fonts. This method is Borland C++ specific since there is no equivalent function in MSC (6.00 at least) although there may be a different method that may be used.
Even though the information was not quite correct, it pointed me in the right direction and solved my problem. I hope the other guy was able to figure his problem out too.
Carl Schelin
A
Thanks for your code. My code was only an outline of how to proceed. I'm glad that it led you in the right direction. I also received comments from Jean-Marie Bind of Stamford, CT and David Devening of Brighton, CO which your letter addresses.
Let me note that it is much easier to do this in Turbo C than in Microsoft C. The bp register is part of the register structure for intr. It is not part of the structure for Microsoft's int86x function. That makes Microsoft's function much less useful for calling interrupts. If anyone knows why it was written this way, please let me know.

gotos Revisited
With regard to the discussion of whether to use gotos or not, as shown in the July 1992 issue of The C Users Journal, I would like to make the following contribution.
Basically, I agree with the reasons that you and Fernando Cabral present with regard to the use of goto versus return to handle error conditions. I have frequently had to handle such conditions, and there is an elegant way to handle them without resorting to gotos. Listing 2, Listing 3, and Listing 4 show how the three options stack up.
Note that when compiled, my code (Listing 4) has the effect of causing a goto to the final return statement, exactly as with the goto method, yet does not use gotos in the code. It avoids the need for thinking of a good label, and forces indentation to show where the break point will be. It has the benefit of allowing single-breakpoint placement as with the goto method, but not exhibited by the multiple return method.
In fact, this structure is so handy that I use it in nearly all routines. This brings consistency to the code and allows error/break conditions to be easily moved from routine to routine.
With regard to the truncate-file routine under examination, it could be re-coded as in Listing 5. (Only this single routine is coded here for demonstration.)
With a good compiler, this produces the same code as the goto method, but without having to keep track of labels, while preserving pure, non-goto, structured code. This saves some time since the maintenance of multiple labels can become a stumbling block; it requires more work to copy equivalent code to multiple routines.
I hope others can benefit from this simple and elegant solution to the goto on error dilemma. Thank you for your column.
Raymond Lutz
El Cajon, CA
Thank you for your input. It is an interesting compromise between Mr. Cabral's position and mine. (KP)

Databases
There are several packages out there for developing database applications in C. Some use dBase-compatible files. Others have their own proprietary file structure. The ones that I have seen all follow the basic structure that a record is a set of fields.
One with a different approach that I've come across recently is the Pinnacle File Manager. Its interface follows the relational model fairly closely. It is easy to create and modify the tables with function calls. For example, you add columns to the table (similar to adding a field to a record layout) with a single function call.
For those interested in investigating alternative approaches to database design, I suggest you investigate Pinnacle.

Printer Fonts
Q
I hope this letter will add to your answer to Gerrit M. de Wit in the June 1992 issue of C Users Journal.
Mr. Wit's letter provides little information on what he is trying to accomplish. Is he trying to take advantage of advanced laser-printer functions, like fonts? Using these functions requires generation of a laser-printer data stream. PCL and PostScript are the two best known laser-printer data streams.
The question of a laser printer ignoring line feed/carriage returns is a bit of a puzzle; I don't know if PostScript printers honor ASCII device control characters. If this is part of the problem, most PostScript printers have one or more modes that emulate non-laser printer data steams such as Epson or ProPrinter.
The question of not producing the ordered characters may be a symbol set problem. For example, when shipped, the LaserJet IIP is adjusted for a symbol set which provides accented letters instead of box drawing characters. This can be fixed by selecting a different symbol set. Most laser printers can change the symbol set from the printer control panel or by sending a control sequence from the computer.
Fonts can cause another class of problem. Each character in a proportional font has a different width chosen to approximate typeset quality output. This can cause surprising output unless accounted for when alignment is important as in spreadsheets or tables.
Your suggestion that Mr. Wit explore print-driver libraries such as the Slate package is right on. I have used Slate and it is a solid package and functionally very complete, supporting 750 printers including PCL and PostScript printers with support for proportional fonts. A couple of other packages Mr. Wit might want to look at, that were advertised in your June 1992 issue are "PostScript form C" from Barton Creek Software and "PrinterGraphics Libraries" from AnSoft.
If Mr. Wit is using Windows, it provides drivers for various laser printers. Finally, if Mr. Wit's operating environment is either MS-DOS or Windows, then he might want to check out the Microsoft Product Support Download service. It's a BBS at (206) 637-9009. The details are 1200, 2400, or 9600 baud, no parity, 8 data bits, 1 stop bit. This service includes device drivers for a variety of printers, displays and so on.
Nearly all laser printers, even PostScript printers, support PCL. There's a book called HP LaserJet Programming by Andrew Binstock, David P. Babcock, and Marv Luse. It is an excellent tutorial on PCL programming. The book is published by Addison-Wesley and contains several thousand lines of C source illustrating how it all works. It's a good tutorial on PCL and laser printing in general, including the no print boarder.
Hope it helps.
Harry K. Philips
Rochester, MN
A
Thanks for your letter. I had an interesting experience with screen fonts on the Macintosh. Normally one thinks of a fixed width such as Courier as having a single width. Variations of that font (such as underlined or italics) might be expected to have that same width. However the Bold option gives a slightly narrower font. This makes for much more elaborate programming to keep columns aligned on the screen. (KP)

Conditional Expression Operator
Q
The recent debate about the use of the conditional expression operator has left me a bit confused. I fail to understand why, because a particular portion of a language is abused, that it should be "banned."
Certainly, some of the examples offered tend to make unreadable code. Some of it even reminds me of the days I worked with 4K, BASIC, and a tape recorder. However, I offer the following:

(1) flag = (a b) ? woof() : foo();
as opposed to:

(2) if (a b) flag = woof(); else flag = foo();
In example (1) it is clear that flag is being set, most likely the important part of the statement. In example (2) the actual assignment of a value to flag is hidden in the if..else construct. Which is better depends on the real purpose of the assignment. Certainly, in (1) it is clear to a maintainer that something is being done to flag.
Another disadvantage of (2) is the possibility of an incorrect assignment being made. How many times have you done something like:

if(a b) t = woof(); else n = foo();
Of course, you meant to use n (or was it t?) both times. At least with the conditional operator you only have one opportunity to use the wrong variable!
Bob van der Poel
Porthill, ID
A
Thanks for your letter. The "ban" is sometimes applied for the same reasons that the "goto" ban is applicable. If one only goes to the end of a function with a goto, then its use may be reasonable. If one uses the conditional operator for simple expressions as you suggest, then it is reasonable. However if one starts to nest the operators as:

flag = (a b) ? (d e) ? ( ( j b)? woof() : foo() ) : whibby() ) : whaho());
then the code becomes a little bit harder to follow (even with the indentation). It also becomes slightly harder to add debug calls.
I might also suggest that it is a matter of consistency. If one already uses the if...else form most of the time, why change?

Source-Code Control Systems
A local firm here (Icarus Software (919) 821-2300 has introduced a product called SourceSafe. It's version control software that works not only with program source files, but also with normal documents. I'll be using it over the next few months and let you know how it works out.