PROGRAMMING PARADIGMS

Recognition: Ink, Speech, and Otherwise

Michael Swaine

Lexicus is a small entrepreneurial firm chartered to provide the most natural handwriting-recognition systems available. I talked with Lexicus president and cofounder Ronjon Nag about the company, the product, the technology, and about the new data type of ink.

Since the company's founding about a year ago, Lexicus has produced a recognizer that can recognize cursive, print, and mixed handwriting. Two versions of the recognizer, in fact: one for Pen-Point and one for Windows. Both products are currently in beta, and Lexicus hopes to have them in products early next year. They're looking at other platforms as well.

One strength of Lexicus's recognizer is that it is a software-only solution. Among the (achieved) design goals were the goals of having it run on an 80386 system with no hardware accelerators or chips and no special hardware, and of having it work at the level of the operating system, so that it can provide consistent, automatic handwriting recognition for all applications running under the operating system.

It's a young company, privately funded, based in Palo Alto, California. That puts it in Silicon Valley, but perhaps more significantly, it puts it within walking distance of Stanford University. Lexicus has strong academic roots.

The staff of fewer than ten people is a degree-heavy bunch of computer scientists, engineers, and psychologists. Just who fits in which category is hard to say: Most of these people have cross-disciplinary backgrounds and interests. They're all PhDs, from Harvard, MIT, Stanford, Cambridge, and Oxford, and they tend to have impressive academic credentials. There is one Rhodes scholar among them, and one Harkness scholar (a Harkness scholarship is like a Rhodes scholarship in reverse; it sends a Briton to America to study).

The Harkness scholar is Ronjon Nag. After earning a PhD in speech recognition from Cambridge University, he went on to work as a management consultant in London. He came to America three years ago, picked up an MBA at MIT's Sloan School of Management, and moved on to Stanford University's psychology department. It was here that he met Lexicus cofounder Chris Kortge, the prime inventor of Lexicus's handwriting-recognition technology.

DDJ: Your background is in speech recognition. But when you started a company, you focused on handwriting recognition. Why?

RN: Speech [is] the glamour area for pattern-recognition scientists. Most people doing recognition [are] involved in speech rather than handwriting, which has been regarded as rather a fringe subject in the last few years. But now, with the advent of pen computers, it's attracting a little more attention. We saw this as an area where we could get in very quickly and make a sensible product. We didn't have to develop the operating system. We didn't have to build the hardware. And, even though the market's been slow recently, we have less responsibility to create the market than the much larger players. This is not the case in speech.

DDJ: Most handwriting-recognition approaches require printed characters as input, but you went straight for cursive recognition. In fact, your system deals with mixed cursive and printed characters. Given that printing-only recognizers still don't do a perfect job, why did you start with this harder problem?

RN: Why we did cursive handwriting is that there seemed to be a sufficiently large number of players managing the print-recognition problem, and [that] problem is fairly well explored in the academic literature. If you look at the cursive-recognition problem, it's hardly been touched relative to print recognition or even speech recognition.

DDJ: Don't you sort of steer away from the term "cursive," though?

RN: Lexicus is trying to do what we call "natural recognition." First-generation recognizers could only do print. In fact, very early ones could only do block caps. We consider ourselves a second-generation recognizer company, trying to produce [a recognizer for] print, cursive, or a combination of both. Most people write as a combination of both. So that's what people want: a recognizer that recognizes their natural handwriting.

DDJ: Their own natural handwriting, or anyone's? Where do you stand on the writer-independent versus writer-dependent dimension?

RN: Our approach is to produce one that is as writer independent as possible, that works out of the box in the first instance. We'll be working on training to increase that accuracy for any particular user, but we've placed very high importance on it working straight away in the store or as soon as the person has opened the box.

DDJ: But isn't writer-dependent training the way to go to squeeze out the greatest accuracy possible?

RN: Training is definitely a way to go. But there may be environments where training is just not possible. Where a machine may be shared amongst a number of people, where people haven't got time to train, or where a machine may be stationary and people come up to it with no prior experience. For people who use it all the time, that's when you have to use training to get that extra few percent accuracy.

DDJ: I know you won't talk about your algorithms, but can you characterize them? Are they refinements of work we might find in the academic literature? Are they purely your invention?

RN: We have a number of recognition algorithms that we have very strong expertise in, within the members of our group. In general, any of the recognition algorithms that are out there in the literature, we have somebody who is an expert in it. And we have our own proprietary work as well.

DDJ: Well, we can talk about the algorithms that are in the academic literature, anyway. What kinds of generally known algorithms are there for handwriting recognition?

RN: The traditional techniques for doing ordinary handprint recognition that are in the literature revolve around neural networks, hidden Markov models, fuzzy logic, clustering algorithms; I've also seen dynamic programming approaches. Unfortunately, if you try to implement one of these published algorithms, they'll get you 75 percent accuracy or whatever on some sort of good data set. It takes a lot more effort to make a real product.

DDJ: Part of your approach is dictionary based. You use different techniques for recognizing cursive and printed characters. You bring expertise with different algorithms to bear. So would it be fair to characterize your approach as a hybrid?

RN: What we usually say is that we use multiple sources of information and multiple techniques to solve the problem. The problem is so difficult that you have to use whatever information you have. Some people ask us how much of the work is done by the dictionary and how much by the letter recognizer, and really, that assumes certain things about the way we're doing it. And, without telling people how we do it, it just doesn't work like that. It works in a very integrated way.

DDJ: Tell me a little about the business. There are fewer than ten of you and you're all pretty much straight out of academia. How does that affect the way you function as a business?

RN: We're not very like the typical start-up, where you have a finance guy, a marketing guy, a CEO, a technical guy, and a programmer, that kind of thing. We have a very strong academic collegiate base: Everyone's a PhD. We also stress people who have a multidisciplinary environment. I have a business background and also a technology background, at the PhD level. And that's where we differ a little from classic start-up companies. We sort of think from a systems point of view, an integrated point of view, and have cross-disciplinary working methods. So everyone gets involved in marketing decisions and technical decisions, and they can do that because they all have the capability to contribute at that level.

DDJ: So how does that work in practice? Do you sit around a table and make group decisions?

RN: Well, we have structured meetings and unstructured meetings. Doing things that are unstructured you typically sit around a table. But we have some structured methodologies of trying to brainstorm ideas out and those take a long time, the structured methodologies.

DDJ: Who besides you has a background in psychology?

RN: Chris Kortge. He also has a background in computer science. Although it's getting to the state where it's very difficult to distinguish between the disciplines. Computer science is entering into areas of psychology, and psychology is entering into areas of computer science via the AI community, mainly driven by activity in neural networks, I guess.

DDJ: Ah, yes, neural networks. Don't you have some connection with Rumelhart at Stanford?

RN: Right. Both Chris and I were affiliated with Dave Rumelhart at Stanford. I was a visiting scholar and Chris was a graduate student of Rumelhart's. Rumelhart has been a major driving force in pushing psychology to useful applications.

DDJ: Before we leave the academic connections, I'm curious how that is working out, coming from an academic background. In starting a business, are there problems with that?

RN: I think we're what's fashionably known as a "learning organization." I had some business background as a management consultant advising CEOs of large companies. But that's a very different thing from running a small company, where you have to do all the nuts and bolts yourself. Things that you think of as managed by somebody else, like phones, you have to have somebody within the company do. That sort of draws attention from developing creative products. In some ways it's difficult, and in other ways it's an advantage, because we can act very, very quickly, much more quickly than a more structured organization. But we've got a number of very talented advisors to help us in situations where there is difficulty.

DDJ: So you feel that you're a nimble company?

RN: The market is pretty dynamic and unpredictable at the moment. What we've gained is a nimbleness, an ability to adapt very quickly. We're not set in our ways. If a suitable opportunity came up, we would drop everything and go and do it, if it was a sufficient incentive.

DDJ: You're working with a number of much larger companies. What's that like?

RN: Usually when we roll into a large company, we get to a pretty high level pretty quickly because of the nature of the product we have, which is not typical with most startups, where you have to have a really hard sell just to get through the door. But if you are talking to a pen company, having cursive handwriting recognition is so unique that you get a lot of attention.

DDJ: Large companies are becoming more open to acquiring technology now, aren't they?

RN: It's something that larger companies have to face up to, that many of the innovative technologies are being done by very small companies and if they want to stay competitive they have to form strategic alliances with those companies that are coming up with the innovations. It's a lot of fun. We get sort of ego boosting when we visit these large companies.

DDJ: Do you find that different companies take different approaches--like wanting to buy things outright, say, versus saying, "We don't want to give you any money up front, but we'll talk royalties"?

RN: It depends on their own situation. Why do they want the technology? Do they want it because they think it's so good that nobody else can beat it? Do they want it because it's not very good but they need somewhere to start from and then they'll make it good? Or do they just want it for convenience? The cost structure of their product doesn't want to handle a royalty burden, they just want to have an up-front cost, and that's it? Mostly we're talking royalties at this point, rather than outright purchases. And that's usually what most companies want to do. One or two might prefer to buy it outright, but usually the sums are not large enough for us to consider it.

DDJ: Let's talk about ink as a data type.

RN: When you think about it, there are a number of dimensions to ink as data. First of all there are the actual characteristics of ink itself. At the simplest level it is just a bitmap, just points at particular coordinate positions. At the next level, you may have the thickness of the ink. The next dimension is the time information: In which order is the ink actually put onto the page or onto the notebook or the tablet. So those are obvious direct physical attributes. But then maybe there are other types of information that you can think of, which have [affected] how people think of ink.

DDJ: Like what?

RN: There are a number of things that you can do with ink if you know that you can switch between ink and its interpretation and back again. One scenario is you have a page where [you have written a letter], and a program goes through and translates each word into its appropriate text translation. Now if you keep the ink, you can go back and see what the word actually looked like. You can imagine this as your computer secretary. Normally, you might scribble down a letter and hand it over to your secretary, who would type it up. But she can't read your writing, so she does her best....

DDJ: ...makes some guesses...

RN: ... makes some guesses, just like a recognition algorithm does. Now you go back to what you wrote. You still may not be able to read what you wrote, but it may give you some clues as to what it was. Another scenario is that you may write 25 pages of notes on some topic, and you may not want to have it all translated. But say you're looking for the note on Lexicus, for example. In this scenario, you get your program to translate all 25 pages. Now it won't get all 25 pages right, but hopefully you'll have written "Lexicus" one or two times in your notes, and you can call up that page. If you're thinking ahead, you may even have a keyword heading for each page of notes as a search item. And that brings us along to the other [aspect] of ink as a data type, and that is how ink data can be linked to other kinds of data.

DDJ: For example, by linking whole pages of uninterpreted ink by one or two keywords translated from the handwriting.

RN: You could also go to the next level. Linking leads you into language. Language will in the future be the way to get extremely high-accuracy recognition. At the moment, Lexicus uses a dictionary to increase the accuracy of its cursive-recognition system. That gets you so far. The next level would be language, where you can have grammars. This has been successfully applied to speech recognition, where it's quite common to get the accuracy up using a language model, trying to work out which word follows another word.

DDJ: Sounds good. What's the holdup?

RN: Unfortunately, these things take a lot of memory. It's unlikely [that we will] see this appearing until memory prices get even cheaper. But this is a natural progression for increasing the power of ink in applications.

DDJ: You're not talking about the power of the data type of ink per se, but about ink linked to other representations: text, pictorial, language.

RN: Rather than concentrate on ink as a pure data type, one should think of ink as a data type that is linked to all these other types of data. That's where people can make the most impact.