PROGRAMMING PARADIGMS

A Conversation with Robert Carr Part II

Michael Swaine

In last month's column, Robert Carr and I discussed the design of PenPoint, GO Corporation's 32-bit, object-oriented, multitasking operating system. This month, we pick up where we left off, focusing on the PenPoint Notebook User Interface (NUI) and imaging model.

DDJ: There are three aspects of your current work that I'd like to ask you about. There are the opportunities for developers that PenPoint offers. There's the operating system itself. And then there's this new paradigm of using computers, pen-based computing. Of course, a new paradigm translates into new opportunities, but it's probably useful to step back and view it strictly as a paradigm and think about the fundamental differences it represents, such as what it means to use a computer without a cursor, before even trying to think about what this means in terms of markets.

RC: It certainly is a paradigm shift. Part of the shift has to do with the use of the pen, but some other parts have to do with other elements. With regard to the pen, you mentioned one of the good points, which is that the pen is a cursorless device. That's wonderful. Cursors are an artificial concept and users indeed have got to go through some learning and some motor skill development in order to manipulate them. Nevertheless, it takes some real work on the software side to overcome some of the disadvantages that come from throwing the cursor away.

DDJ: Such as?

RC: One of the advantages of the cursor is that it is an accurate pointing device, because as you are positioning it, it is giving you feedback as to exactly where you are pointing it, whereas with a pen, as you come close to the screen you still don't know where the software thinks you're pointing. It turns out that you can overcome nearly all of that pointing resolution loss, if you will, by putting more intelligence into your software. But that's just one of many examples of needing the right software for the pen versus for the cursor.

DDJ: So how do you do that? How do you solve this particular problem of pointing resolution loss?

RC: Intelligent targeting algorithms. "The user's trying to point to something. Let me look around and see what's close to this pixel that they probably wanted. The command they just drew is a command that always acts on words instead of characters; let me look for the word that's closest to here and not just the character that's closest." There are a lot of targeting heuristics that you can put in the user interface, and it can take a lot of testing to get those right. So loss of the cursor is part of the paradigm shift. Another part is the opportunity to support gestures, to invent a whole new way of controlling software. We found that gestures work very well with users because a well-designed gesture set tends to have some amount of intuitive obviousness to users.

DDJ: Still, it's an awfully open-ended problem, isn't it? You're in the position of having to build a language that people will take to naturally, and there isn't anything universal to build on. Or is there?

RC: There is no standardized mark-up language in our society, and in fact most lay people have never learned any mark-up language. So you need to design a gesture set more to harmonize with their "collective unconscious" with regard to what mark-up commands to use. We found that users can learn a well-designed gesture set very quickly, both because it has strong mnemonic value visually and also because you're drawing gestures out with your fingers -- what I call finger memory -- gets called into play.

DDJ: Motor memory.

RC: Right. And of course motor memory is a very real memory. So gestures are quickly learned, but they're also very efficient, since they collapse the two-step of selecting and then acting down into a single step.

DDJ: I hadn't thought of that.

RC: That's the technical reason why users also experience the "funness," or immediateness. Which brings me to the other major aspect of the paradigm shift. Good user interfaces have the attribute of transparency: The user tends to forget that there's a user interface mediating between them and the thing they're working on. It's extremely difficult for desktop computers, or even laptop computers, to ever have a very high degree of immediacy because of a variety of reasons. First of all, the mouse and the keyboard are remote-control devices: You're doing work down here to work on the screen up there. But then also the very nature of the device is that it tends to control you. You have to come to the device, you have to sit in a certain requisite position, you have to hold your hands out in front of you: It's dictating a whole lot of your behavior patterns. The fact is that we've all accepted this and we never think about it, but I believe at something of an unconscious level we tend to...resent is probably too strong of a word, but we notice the friction and the costs that come from having to come to the computer. "Oh, gee; I have to do some work on my computer." What always goes through our minds is, "...and therefore I must go up to my home office and sit at the desk. I can't do it here on the couch in front of the Saturday afternoon football game," even though, perhaps from a concentration point of view that would be just fine. So the computer forces us to come to it on its terms.

More Details.

DDJ: Which is an area in which pen-based notebook computers have the advantage: You can take them with you. But if I understand you right, you're arguing that this advantage is more than just a matter of convenience, that it's a genuine paradigm shift in computer use. How is it a paradigm shift?

RC: With pen-based computers, in the mobile market at least, with devices that are physically rather small -- tablet-sized or smaller -- all of a sudden that equation has shifted for the first time. The human is in control again. The human can use the device wherever they like, in almost any posture that they like, and they can wave the device around, they can look down at it. And I think that that's the other half of this paradigm shift. There's the pen half, but there's also this physical relationship half, in which now you are dictating how you use the device, and it's much more inert and you are the active agent. Does that make sense?

DDJ: It does. I was wondering. A keyboard is obviously a discrete input device. There are only so many keys. Do you have a consciousness when using a pen-based system of it being more continuous? When you draw a circle around something do you have more of a sense of moving in a continuum?

RC: No, you don't think so much "circle," you think "edit," since that's the operation you're doing. And that's the transparency that starts occurring. A part of this transparency of gestures is that we found with our gesture set that we were able to oftentimes arrive at gestures that made -- I'll call it physical common sense. Namely, that the direct motor actions that you're taking with your pen tip had some kind of an analog to the semantics of the operation that you're making.

DDJ: You'll have to give me an example.

RC: The most obvious case in our scroll gestures, in which, if you want to scroll, you actually shove the window contents up and down, left and right, with the pen tip. Another good example would be that our gestures for inserting spaces and carriage returns actually tend to end with a movement in the direction that you're opening the space up in. That's part of this mnemonic value that makes them quick to learn, but it also contributes to the transparency in which you actually start thinking, "I'm opening up space here," or "I'm cutting this object in half."

DDJ: You also wrote your own imaging system for PenPoint. Tell me about Imagepoint.

RC: In the graphics area, like many areas, PenPoint is set up to be highly configurable, so in fact we can support multiple graphics subsystems being installed; but Imagepoint is the one that we've developed and, it's by default installed with PenPoint. And currently all of PenPoint's user interface elements use it, so PenPoint applications are typically all using ImagePoint, at least the early ones. But it is conceivable that we could install a Display Postscript graphics subsystem and folks who wanted to use that could then talk to that.

DDJ: Is ImagePoint Postscript-like?

RC: In its architecture it is most similar to Display Postscript, but it's much lighter weight. Which is the reason we didn't use Postscript itself. ImagePoint runs in less than 200K of memory and displays pretty good performance on the 16-MHz 286s that we were showing you last winter. Like Postscript it unifies text as a graphics primitive along with all other graphics primitives, and all graphics primitives are translatable, rotatable, and scalable, which makes it very straightforward for applications to produce graphically rich user interfaces that have a rich mixture of text. The text in ImagePoint is based on outline font technology, so you use a minimum amount of memory to store the fonts, but we can display them on demand at any point size.

DDJ: How difficult would it be to put Display Postscript on a PenPoint-running machine?

RC: It's analogous to a lot of porting. Since PenPoint supports ANSI standard C, it's typically very straightforward to port what I'll call engine code; code that mostly talks to itself. Most applications come in two halves: the engine half and the UI half. The UI half makes numerous calls into the underlying operating system to present its user interface, and user interfaces are very difficult to port across different operating systems. Engines, however, by design, tend to be extremely portable. This is one of the reasons why you'll find a lot of applications coming over quite readily. Not because they try to port the user interface, because that actually needs to be rewritten and rethought for the pen. And that's true even if you work in an existing OS: You'll find that with Microsoft Windows for Pen; increasingly, Microsoft not only admitting but arguing that, yes, Windows applications need to be rewritten for the pen. So once you're rewriting the UI for the pen, and if the engine is easily portable, shouldn't you be coming over to an OS that from the ground up was designed for mobility and for the pen? That's our basic argument.

DDJ: So to port Display Postscript...

RC: Postscript would be ported over to PenPoint as a new instance of our imaging class, and it would subclass it so it would respond to its Postscript messages, whereas our ImagePoint -- technically it's actually called "class Sys-Graf"--responds to its imaging model. Supporting Postscript would be pretty straightforward; what would be difficult is to get PenPoint code in applications that are currently talking to ImagePoint to convert over to talking to Postscript. Although we're architecturally similar to Postscript, we certainly are not API-compatible and we're not intended to be. Otherwise we would have licensed Postscript.

DDJ: But if someone were developing a dedicated machine and they wanted Postscript and they were responsible for their own applications--

RC: Yep.

DDJ: What opportunities do you see for software developers that fall out of the pen market?

RC: I think there are a wide variety of opportunities for developers in the pen market. I think the analogies with the PC market are very close. Looking back, I think that many people see that if they had been first or second in a given application category on the PC or Macintosh they could have done pretty well, and now that they're facing being the 27th entry in the category, it's going to be very difficult for them to come up with a product where they could compete or find a publisher to work with. The salient thing with pen computing is it's a brand-new market and it'll be quite a few years before it's "too late" to really be out there competing with other software companies. That's one point: It's a new market, and therefore you can compete on something of a level field with many of the established software companies. The second point is that, because of the embedding architecture of PenPoint, and because it's object oriented, it allows, in fact almost forces, applications to be smaller and more focused and not to be these large monoliths that do 27 things, which today's spreadsheets and word processors and presentation graphics packages are.

DDJ: We seem to be at a stage where every application needs to be a Framework an integrated software package.

RC: They're drawing programs, they're typesetting programs, they're spreadsheets, they're graphics programs; they don't call them integrated software, but they are. No small software developer or company can compete with a Lotus or Microsoft on that playing field. Well, in PenPoint, because the OS does the integration for you, all of a sudden the status quo, the expected norm, is that your application does one thing and does it well. And all of a sudden a small development team can compete against, and will always be able to compete with, a large production organization.

DDJ: And a lot of these machines are going to be small machines, with limited resources.

RC: Right. So well-designed and well-crafted software will be highly valued. So the first resource is a new market, where most categories are still open, so there are good opportunities. Secondly, applications tend to be smaller and more focused. The third thing is that many application categories are waiting to be invented. There's both the opportunity and need, and also the satisfaction of tremendous creativity. We believe that most of the best-selling applications five years from now in pen computing are still to be invented. We have some ideas about what those are, but our basic faith is that it's the creativity of the application developer that'll drive the market growth, and also the invention of these new application categories.

DDJ: Do you believe in the concept of the Killer App?

RC: No, no one killer app. I don't think we'll ever again see something as striking as VisiCalc on the Apple II in terms of being so clearly identifiable as an application that helped to birth an industry or marketplace. We do have a killer data type in mind, which is ink. It's become real clear over the last couple of years that most pen-based applications will and should support ink as a data type, just as today they support, perhaps ASCII text and floating point as data types that they manipulate in various fields. So you'll see ink markup layers, acetate markup layers, ink annotations, ink Post-It notes, entire ink editors, which you might think of as a note-taker or something, where you are never really translating ink, but perhaps you are reformatting it and editing it.

DDJ: You'd better tell me what ink is.

RC: Oh, yeah; ink is really just the path that the pen followed, captured from a digitizer and stored as a sequence of strokes or polylines or curves, however the software wants to represent it. That data structure, that sequence of strokes, can then be displayed on the screen or printed out, ultimately. But like an object-oriented drawing program, an ink editor would let the user edit these strokes on the screen, perhaps cut them in half, delete some of them, maybe rescale some of them, move them around. So if you take a page of notes and then you want to go back to the middle of the page and add some more thoughts, on paper of course you'd have to write in the margin and draw a line. With an ink editor you'd simply give a quick gesture, a flick of the pen tip, and that would open up some white space, because it would have shoved all the strokes below the middle of the page further down. And you can start seeing some real benefits to adding the computer to your handwritten notes.

DDJ: So an ink editor wouldn't interpret the strokes, wouldn't treat them like letters the way a word processor or text editor would, but would just let you manipulate the marks as marks.

RC: Jerry Kaplan makes a good analogy. He points out that as the word processor was to the typewriter, in terms of letting us reformat typeset text, ink editors are to ink on paper. An ink editor or note-taker lets you reformat, edit, open up your handwritten notes or diagrams. That's an easy example of a new application category that does not exist on the desktop PCs--but will be a hotbed of innovation.

The PenPoint UI

Roland Alden and Tony Hoeber

Roland is one of the principal implementors of PenPoint. Tony is the Notebook UI architect. They can be contacted at GO, 950 Tower Lane, Suite 1400, Foster City, CA 94404.

The current generation of GUIs was designed to be used with a keyboard, mouse, and desk-bound workstation. In designing PenPoint we assumed that the primary input device is a pen, and the hardware form factor is a notebook or pocket-sized pad. Starting from these premises made a world of difference. In PenPoint the organizing metaphor is the notebook instead of the desktop, and the interaction style is gestural as well as graphical. The interface is also inherently scalable to accommodate different display sizes and resolutions.

Everyone is familiar with real-world notebooks, forms, sheets, pads, tabs, bookmarks, sticky notes, and so on. The elements of the PenPoint Notebook User Interface (NUI) all make sense within the context of these familiar objects. The user of a PenPoint machine begins by seeing a notebook on the screen that has a table of contents page. Instead of launching an application, the user simply turns to the desired page in the notebook. Instead of window-oriented control panels, data entry areas and radio buttons, the user deals with option sheets, writing pads, and checklists.

The notebook is a very flexible organizing model. Users can temporarily "unsnap" pages from the notebook to view multiple documents at once, or chunk their data into multiple notebooks. PenPoint also allows users to create compound documents by embedding one document within another, and to create hypertext-style buttons allowing quick navigation from one document to another.

One of the strengths of the pen as an input device is that it allows the user to indicate both the operand and the operation with a single gesture. This is often easier than first selecting the object, then locating the command on a pulldown menu. Gestures are thoroughly integrated into all aspects of the system. There is a core set of 11 gestures that work the same across all applications. These core functions include delete, insert, move/copy, edit, scroll, and so on. An example is the "flick" gesture -- a short line moving up, down, left, or right. Its primary function is to scroll, by moving the line of text being flicked to the top or bottom of the page. Flicks work in contexts other than scrolling. Flicking left or right on the title line of the notebook turns to the next or previous page; flicking up or down on the title line of a floating notebook zooms or unzooms; flicking on overlapping notebook tabs brings obscured tabs into view, and so on. In each case the same user model is maintained: to shove the object and bring more information into view. Note that PenPoint doesn't force gestures on the user. The system always offers a dual command path that allows either a gesture or a selection from a control to precipitate an action.

The PenPoint UI is completely scalable. There are a number of reasons for this. Because PenPoint-based computers will come in a wide variety of form factors and display types, the amount of text in a UI component that can be displayed without scrolling can vary widely. In addition, everyone has different preferences for viewing text on a screen, concerning both size and font style. These personal decisions can change from one time or place to the next. For instance, someone who normally reads menu text at 9 points may prefer to double the size to 18 points when trying to read the computer display in a moving vehicle, or walking from a bright room to a dimly lit one.

Everyone will make a different trade off between font style and font size to fit constraints of display resolution, lighting, and so on. Because of this, the user of PenPoint can control both the size and style of the "system" font used by most user-interface components to display text.

Further, different parts of the UI may need to be displayed in different formats. For instance, at low resolutions, Asian letterforms (Japanese Kanji and Korean Hangul) must be displayed at a slightly larger size from Latin letter-forms. PenPoint therefore allows certain UI components to display text in sizes relative to (smaller or larger than) the reference size chosen by the user. Even the thickness of separator lines on menus can be relative to the size of the system font.

Another dynamically alterable variable is the display proportion: whether display hardware that is rectangular operates in "landscape" or "portrait" mode. PenPoint-based computers are small and portable so the user can easily physically reorient them.

It's beneficial to place these choices in the hands of the user; however, this can place a heavy burden on the user-interface toolkit, and it requires the application developer to abandon some ideas about user-interface design.

For instance, user-interface tools such as those found on the Macintosh or MS Windows platforms treat the user interface as a two-dimensional graphic design problem. The size and position of UI components as well as their spatial relationships are decided by the application programmer using tools that resemble a drawing program. We call this the "pictorial model" of user-interface construction.

PenPoint replaces this pictorial model with what could be called a layout model. An application developer describes the relationships between different UI components in general terms, including constraints regarding size and position. At runtime the system optimizes the layout to fit the prevailing display conditions.

The PenPoint layout model operates over trees of windows. Special window classes are used to provide layout behavior for their child windows. Every UI component (button, scrollbar, and so on) is a subclass of Class Window (clsWin) in PenPoint's object-based programming system. The Layout window classes do not draw anything (except borders); they simply organize child windows. Application programmers can create new types of Layout windows, and these can lay out standard component windows. Conversely, standard Layout windows can be used to lay out custom component windows. PenPoint provides two standard Layout classes: one that supports tables, and one that supports arbitrary window arrangements.

Object-oriented programming is the key to building a flexible system such as this. There are two dimensions to this flexibility. The first, and arguably the most important, is subclassing. Through subclassing, a programmer can adapt an existing tool without having to reinvent complex behavior that does not need to be changed. And sharing as much behavior as possible makes programs smaller and more reliable, and their user interfaces more consistent. The other key to flexibility is keeping objects packaged as small units of functionality and allowing them to be combined into larger aggregate objects. Happily, this is a service naturally provided by the layout classes.