June 1990/Data Hiding And Abstraction

Software Design

Data Hiding And Abstraction

Bryan Glennon

Bryan Glennon has been designing and developing software professionally for the past nine years. He consults for companies such as McDonald's, Rockwell International and Ameritech Applied Technologies. For the last six years, he has worked almost exclusively in C, on various platforms. Bryan can be contacted at P.O. Box 841, Bensenville, IL 60106, (708) 595-6059.
When designing a library that will be available to a large number of applications or programmers, you should generally hide the library's internal structures and data representations from the user. Data hiding allows the library to evolve with minimal impact on the users of the library. However, the user must be able to access some or all of the data stored in the library. This article will describe the benefits of hiding the actual internal data structures of a library, and of providing abstract objects to library users. To illustrate my point, I'll present several of the abstractions in a forms management system.
Data hiding is the process by which a library's internal data elements and structures are made inaccessible to the users of that library. If the data item's internal representation is always hidden from the user, then changes to the internal representation will have a minimal impact on the library's users. Hiding the distinct data elements that a library uses increases the library's independence from the application code that relies upon it. Variables are usually hidden by restricting their scope. In C, data hiding can be accomplished by using static variables. When declared static, a variable's scope is limited to the source file in which it is defined.
Data abstraction, on the other hand, is the process that allows the user to utilitze the library's otherwise hidden data. Data abstraction creates a data type, or abstraction, from a group of individual data elements.
The elements that comprise an abstraction are usually related in some way, and in C, the programmer generally provides a function or set of functions to manipulate these abstract data items. This set of functions comprises the "operators" which can be applied to the abstract type. For example, in a graphics application, you might define an abstract data type called a POINT corresponding to the x- and y-coordinates on the screen. The library user would only "see" these abstract data items. With this level of abstraction, the library is free to change the actual internal representations of the items, as long as the abstract items created remain constant.
Abstractions also allow library users to deal with more meaningful items. Again, using a graphics library as an example, one abstract data type could be a circle, specified by a function accepting as arguments the center and radius. Library users will invariably prefer to deal with the function than the actual internal representation, a collection of points comprising the circumference.
There are other benefits to incorporating data hiding and abstraction into your library: internal consistency is enhanced, you increase the isolation between your library and your application, and you tend to reduce maintenance costs.
The internal consistency is increased since data structures can be changed throughout the library without worrying about the external users. When applications access the actual primitive data items used in a library, the library would either have to use these representations throughout or convert them to the most appropriate representation internally. With abstract types, however, the programmer is free to change the library's internal representation. After all, the abstract types will still be based on the internal representation, even though it's modified.
Your library's isolation is also increased since no external functions access any of the internal library data items. By isolating the library from the application, the library can evolve without impact on the application.
The increased consistency and isolation both help to reduce maintenance costs. The cost of maintaining the library is decreased since no external constraints are imposed on the internal data structures. The cost of maintaining the application is decreased because changes to the library will not affect the application code. Furthermore, since the abstractions are derived types, it's easier to keep them consistent than to keep the primitive data elements constant.

A Forms Management System
The following example shows how I used data hiding and abstraction to enhance a forms management system, which required:

providing a consistent look and feel to the ultimate end user,

providing a clean, easy interface for the application programmers,

hiding the details of screen manipulation and I/O from the application programmers, and

designing a library that would be easy to expand and maintain over the life of the project.
This design was based on the following main data types or abstractions:
Labels. These would be static areas of text, not modifiable at runtime.
Fields. These were areas where the user could provide input and where values could be placed by the application program.
Screens. Also called forms, these were collections of labels and fields.
For this project, I created several tools to assist in the data hiding and abstraction process. The main tool was a forms editor that allowed the programmer/designer to create screens interactively. The programmer enters text onto the screen and specifies fields by pressing a function key that makes a data entry form appear. This form collects all of the information used in a FIELD structure. Once the designer is satisfied with the screen layout, placement, and types of fields, he or she can save the form to a disk file. The file will be processed by an open_screen() function call at runtime. I also created tools to test forms, change field access order, and generate header files from a form file.
At the application layer, the programmer manipulates screens which are referenced by a screen id, a small positive integer. For finer control, the programmer can use any of a number of field-level routines. He or she can reference the form either with manifest constants (generated by a tool) or by the assigned field name. Labels are not accessible to the programmer at runtime. All other details are hidden in the library. For instance, the programmer does not have to know the type of data expected by a field; this is handled by the library.
The first abstraction is that of a LABEL. Since labels aren't accessible to the application programmer, I won't cover them in detail. I present the LABEL structure in Listing 1 only because it is referenced in the SCREEN structure. Actual labels are created by the forms editor tool, during form definition.
The second abstraction, and one that can be manipulated by the programmer, is the FIELD. A field is a fixed length area on the screen where the user can enter data (if allowed by the field type) and where the application can display data. Fields are also created by the editor tool used to create a form.
The programmer can reference a field in two ways: by the name assigned during form creation, or by a field number. For simplicity, I created a tool to generate manifest constants for all field numbers. The forms editor tool also allows the programmer to change a field's location independently of the application code.
The programmer manipulates fields by using library functions. I created functions to get the data currently in a field, to place data into a field, to change certain field attributes, and to associate functions with a field for automatic execution by the library.
Without the abstraction of a field (see Listing 2) , the programmer would have to know where to place the cursor when collecting data from each field, how long each field is, and what type of data is allowed, e.g. integer, floating point, alphabetic. He or she would also be responsible for all input from the terminal device. As you can see in Listing 2, there's a considerable amount of data hidden by the field abstraction.
The only way to access items in this structure is from the library. No direct programmer manipulation of these items is allowed; the data items are completely hidden.
There are several other advantages to hiding the internal representation and restricting programmer access. Since the library handles all input, all fields behave the same. This goes a long way towards satisfying the design specification of a consistent user interface. The application programmer simply can't change how data is accepted — except as allowed by the library. Furthermore, it's impossible to accept more input than the field can hold, since the library bases the input on the length hidden in the field structure.
Even routine processing of error messages and requests for help can be guaranteed consistency, since they, too, are handled by the library.
The third abstraction is a SCREEN (see Listing 3) . A screen contains, among other things, a list of labels and a list of field abstractions. Screens are opened, closed, hidden (un-displayed), populated, and read by the application program through library functions calls. The actual data associated with the screen — the window, relative location of the fields, the static text, default field access order — are all hidden within the library. The programmer can read all fields with a single function call. The internal library routines that manipulate data actually use the field abstraction and associated library function calls.
To display a screen, a screen name is sent to a library routine. The screen definition file is located, the screen opened, and a screen descriptor returned, which the programmer uses to reference the screen. Listing 4 shows a sample application that opens a screen, associates an edit function with the age field on the screen, and collects data until the user signals completion. Listing 5 shows the screen layout, and Listing 6 shows the header file that is generated from an applicable form definition.
Designs that employ data hiding and abstraction can also save development time. Even as I was developing the library, applications programmers were writing the code that used the library. As long as the interface remained constant, I was free to change the internal structures. When an application program issues a get_field() or get_screen() function call, the appropriate data is returned, without the application having to know about the screen layout or the internal library structures. As long as the application was able to reference fields and screens by name and/or id, I was able to change the library as needed.
To summarize, data hiding is the process of making certain information unavailable to application functions. Data abstraction is the process by which new data types, or abstractions, are created. These new data types are then manipulated through a well-defined set of function calls.
I have found the concepts of data hiding and abstraction very useful, not only in the application described, but in many others as well. A clearly defined library interface, along with a well selected set of abstract data types, reduces the pain of library implementation and maintenance.