This month marks the official launch of the Dr. Dobb's Handprinting Recognition Contest. If you've been following recent issues of Dr. Dobb's Journal, you'll recall that Ron Avitzur got the ball rolling in the April issue, by presenting a Macintosh-based handprint recognizer, complete with an interactive data-collector application. Ron has since written a platform-independent harness to test recognition engines. This harness works off stylus data stored in disk files, rather than requiring interactive digitizing hardware, a pen computer, or a pen operating system.
Before delving into technical details of the harness, here's a quick summary of contest rules. For this first-ever competition, we're fortunate to be able to offer an extremely tasty first prize--in the form of a PowerBook 100 generously provided by Apple Computer. The contest begins on June 15th, when the official version of the DDJ test framework, test data, and contest entry blank become available electronically. Deadline for submissions is September 15th. We'll announce a winner in our December issue.
Your recognizer can use any platform on which the DDJ test harness runs. The DDJ harness code assumes only the C standard library. However, even though you can run the harness on any platform that has a C compiler, we can only test your code on Macintosh or PC platforms. Assuming your code is portably written, this should not be a problem.
You must send in both source code and an executable. Any other written commentary or documentation is also welcome. Source code is for publication and can be in C (or, on the PC, in any language that can be linked to the OBJ files of the DDJ test harness).
Submissions will be judged primarily on recognition accuracy. Speed is a secondary consideration; third is the conciseness and elegance of your implementation.
The test-harness package contains executable, source, object, make, and data files, as well as a sample recognizer by Ron Avitzur. The READ.ME file describes all of these in detail.
The DDJ test harness first reads all information from the character-data file into an in-memory data structure. The character-data file is in binary format. For each ASCII character, there can be a variable number of character prototypes (sample characters). Each character prototype, also known as a gesture, is composed of a variable number of strokes. Each stroke is composed of a variable number of points. The process of reading in the data therefore consists of several nested for loops.
After reading in the data, the harness loops through the top-level Char-Data[] array, which contains pointers to lists of prototypes. During the training phase, characters are passed to your recognizer's Train() routine. Your training routine should derive from this data a set of features that will later be used in the recognition phase.
During the recognition phase, the test harness passes a different selection of characters to your recognizer's Guess() routine, which can return up to three guesses per character. Each guess must have an associated weight or confidence value.
Writing a general-purpose recognizer can be a large and daunting task. For purposes of the contest, we've constrained the problem in various ways. In the test data, segmentation of strokes into individual characters has already occurred. The sample recognizer works a character at a time, as opposed to using context information (such as a word dictionary). The character set consists only of alphanumeric characters plus a few punctuation characters. Input data consists of stylus datapoints from pen-down to pen-up. There is no proximity information or velocity data, nor are there timestamps associated with point coordinates.
The sample recognizer included with the test-harness package performs pretty well, with better than 90 percent accuracy on certain sample data. Nevertheless, it suffers from a number of limitations which you can improve upon:
As many researchers have discovered, writing good code is only part of the problem in building a recognition engine. The rest includes amassing a suitable collection of test data.
Our sample recognizer works well with our current set of data, but may stumble on other valid data that it has not previously encountered. For judging the contest, therefore, we will attempt to run all recognizers on as broad a data set as possible, including any data that you submit with your entry.
Copyright © 1992, Dr. Dobb's JournalHow the Harness Works
Hints for Contestants
The Importance of Data