The Formulate Visual Programming Language

Dr. Dobb's Journal August 1999

Representing structured data

By Allen Ambler

Allen is a professor of computer science and director of the Design Lab at the University of Kansas. He can be contacted at ambler@eecs.ukans.edu.

Universally, programming languages struggle with a language representation for manipulating structured data. Even languages designed around specific data types, such as arrays in APL or lists in Lisp, struggle with the language representation for manipulating these types.

The problem is that textual representations generally index structures based on some origin. For instance, suppose you want to divide a vector (or list) into two pieces of approximately the same length. Using a textual representation, you would reference subvectors by indexing the full vector from 0 to lengthOfVector/2-1 and from lengthOfVector/2 to lengthOfVector. Such origin-based indexing often results in rather complex subscripted expressions that are not always obvious and often are a source of programming errors.

Visual representations for manipulating structured data, an offshoot of visual programming language research, help eliminate both complexity and errors in working with structured objects. In this article, I'll describe the Formulate visual programming language and its visual representation for manipulating structured data. We are currently working with the fifth major revision of the Formulate language, which was implemented to run on Apple MacOS using Digitool Macintosh Common Lisp, and (in a more limited form) under Windows and UNIX running Franz Allegro Common Lisp.

Although Formulate was designed to run in a distributed environment, the original Macintosh implementation did not take advantage of such design. The version we are now working on is a Lisp-Java implementation that runs over the Web. With this version, users can build Formulate programs from web pages, much as they would with spreadsheets. The programs they build are then stored at the Formulate server. Formulate web pages can then be made accessible as web pages to other users.

Several aspects of this system distinguish it from typical Java-developed web pages.

While computations are redone automatically whenever necessary, recomputing will often involve interactions with the server. Such server interactions will be carried out without requiring user interactions.
Computations on one form (page) may depend upon computations on another form, perhaps even supported by a different server. All necessary communication is carried out by the system to keep such computations up to date.
Most importantly, a spreadsheet-level knowledge is all that should be required.

Although the basic Formulate interpreter (available electronically at http://designlab.ukans.edu/~ambler/ddj/ and from DDJ; see "Resource Center," page 5) remains unchanged, it does require changes to support the web interface, which is being built on top of Java 1.2 Swing. This is the focus of much of our current work.

Research into public programming motivated the design and implementation of Formulate. The objective was and still is to find a programming language manageable by people other than formally trained programmers. Formulate is by no means a complete solution to public programming. Rather, it chooses as a baseline people who have at least a high-school algebra competency, but still no formal programming training. This includes users who have successfully used spreadsheets, the most successful public programming paradigm to date.

Formulate Basics

To build computations in Formulate, you create new instances of objects and define their values through expressions in much the same manner as spreadsheet computations are defined. Like spreadsheets, all values are computed immediately and updated as often as necessary to keep values current. Unlike spreadsheets, there is no grid.

A form is an arbitrary collection of objects called "cells." Generally, forms provide a visual organization for representing information and for interacting with the system. Conceptually, forms model any of the thousands of forms you encounter and interact with in daily life.

Objects have attributes. Each attribute has an expression and a value. An attribute's value is determined by evaluating its expression. An expression is a literal, a reference to another attribute's value, or a function applied to expressions. An attribute's value is obtained by evaluating this expression. (For more information, see "Formulate Solution to the Visual Programming Challenge," by Allen Ambler and Andrew Broman, Journal of Visual Languages and Computing, April 1998)

Figure 1 shows three cells representing the computation of the area of a circle. The image on the top left appears much as a user would view the form. Users would not see the grayed-out image with numbered cells at the top right. It appears here only to show the expressions associated with each cell (more on this will be discussed shortly). In Figure 1, the three cells are given the expressions radius=10, pi=3.14, and area=radius*radius*pi.

Function Definitions

Compared to spreadsheets, one of Formulate's distinctive features is the manner in which functions are defined. In traditional languages, the process of building functions requires that you must either develop abstract code, or perform the necessary abstraction from specific code. This abstraction process requires forethought and can be limiting to public programmers. To facilitate this process, Formulate does the work of abstracting from specific computations.

For example, consider building the function areaOfCircle directly from the form in Figure 1. To begin, you get a blank function definition form. You then edit the syntactic form (top part) to identify the input and output parameter lists. To identify these parameters, users click on the corresponding cells. In this case, RADIUS is the only input object and AREA is the output object.

After users have edited the syntactic representation, the system automatically builds a function. It does so by:

1. Copying the referenced objects to a new function form.

2. Abstracting input parameters by removing any prior expressions.

3. Adding all objects in the closure of the parameter expression references.

4. Performing optimizations to compress expressions and eliminate unnecessary objects.

Figure 2 shows the resulting function definition. The cell pi was first added to the function form as part of the closure, then removed by the optimization process, which replaces the reference to pi by its value.

Structured Data

Formulate supports three types of structured data:

Arrays (of any number of dimensions).
Lists.
Tables (database relations).

How Formulate users work with structured data is an interesting aspect of the language. In particular, Formulate makes possible working with structured objects without requiring indexing. The design objective has been that all manipulations on structured objects should be doable without low-level indexing operations that are not essential to the statement of the problem.

A "partition" is a logical division of a data structure. It exists solely for defining and accessing the data structure. The areas of a partition are called "regions." That is, a partition is a logical division of a data structure into regions. For a list you might want to talk about the first element and the rest of the elements, for instance. This would require that you partition the list into two regions, the first is simply the first element (for example, a region of length 1), while the second is whatever remains of the list.

For any structure, there is one base partition that contains the expressions that define the values of its regions and thus indirectly the values of the structure. In addition, there can be any number of view partitions that have no defining expressions, but rather are simply used to extract values from the structure as defined by the base partition. Regions of the base partition are referred to as "base regions;" regions of a view partition are referred to as "view regions."

Appending Two Arrays

In Figure 3, two arrays (A and B) are appended, yielding a third array (C). Array C is partitioned into two regions. The leftmost region is given the expression A and the rightmost region is given expression B.

Append was defined without any of the arrays having been given bounds. This situation could have resulted from there having been no bounds on the objects used to define the function, or as a result of the automated process of function building. The latter case results when the objects used to define the function have main expressions, but not bound expressions -- the bounds are derived from the main expressions. In this case, the abstracted functions will adapt to the bounds of supplied arguments. If there are explicit bound expressions associated with the objects used to define a function, then these expressions are retained. By this means, you can limit the bounds of a function.

When this function is applied, Formulate computes the resulting array, including its dimensions. A potentially interesting case is when a two-dimensional array A has a different number of rows than another two-dimensional array B. What happens in this case is determined by other information supplied by you. If users specify nothing regarding this situation, the system constructs an array with the number of rows equal to the larger number of rows in the two arrays A and B. This array will have a hole -- a portion of the array where there are no values. On the other hand, users can stipulate by way of a constraint that the number of rows in A must equal the number of rows in B. Formulate even supplies some intelligent, automated assistance in recognizing and coping with this situation. (For more information, see "Applicability Checking in Visual Programming Languages," by Guijun Wang and Allen Ambler, Proceedings of IEEE Symposium on Visual Languages, 1994.)

Partitioning Data Structures

The array in Figure 4(a) has a single region that is 3×5. To form new regions, you split existing regions. One way to do this is to click on and drag a region boundary line. For instance, in Figure 4(b), the left border has been dragged right by two columns. The effect is to partition the array into two regions as shown. Continuing in this way, you arrive at Figure 4(c), which now has six regions. Existing regions can be resized by first selecting a region and then dragging a border.

Regions can be deleted by dragging boundaries as well. In Figure 4(d), for instance, the left border of the selected region is dragged right by one column producing Figure 4(e). This has the effect of deleting the region that was adjacent to the stretched region. There are now only five regions.

Figure 5 shows an array with an accompanying view. For any array there is only one base partition, but there can be arbitrarily many view partitions. The distinction is that the base partition's regions have expressions that define the values of the array. A view partition's regions do not have such expressions. The primary usage of a view is to allow a different partitioning of an array.

Summing a List

Consider summing the elements of a list (or vector) L. Figure 6 provides a solution to this problem. The essential idea is to construct a second list, S, where each element of S is the partial sum of the corresponding element of L and all prior elements of L; that is, the ith element of S is the sum of elements 1 through i of L. You partition a view L' of L into the first element [2] and the rest [3]. You construct a similar base partition for S. The first (one element) region of S [4] is set to the first (one element) region of L' [2]. Before proceeding you construct another view partition of S, call it S'. S' is also partitioned into two regions, the second [7] being only the last element and the first [6] being all but the last element. You now fill in the expression for the second base region of S [5] to be the sum of the second region of L' [3] and the first view region of S' [6]; that is, using the regions number in Figure 6, [5]=[3]+[6]. That this equation works may require some explanation. The way to think about it is that i,jth elements of regions [3] and [6] will be added, giving the i,jth element of region [5]. As the ith element of [5] is the (i+1)st element of S, the ith element of [3] is the (i+1)st element of L, and the ith element of [6] is the ith element of S, then, for all i>1, the (i+1)st of S=(i+1)st of L+ith of S, with the 1st of S=1st of L. There is an implied order in which this evaluation must proceed if it is to produce the correct result. The system guarantees that if such an evaluation ordering exists, it will find it. Finally, to complete the example, the result SUM [8] is simply the second region of S' [7].

Computing the Binomial Coefficients

Next consider the problem of generating the binomial coefficients for a polynomial of power n. (Figure 7 shows a solution.) This example takes a single input parameter [1], the power of the polynomial. Using this number, the dimensions of the array in the upper left are determined as n×n. The top region [3] has an expression 0; that is, every value of this region is set to 0. Likewise, the left region [2] is set to 1. To define the remaining (n-1)×(n-1) region [4], you use the two views on the bottom, the [4]=[5]+[9]. These two view partitions each have an (n-1)×(n-1) region as well. If you consider any slot in region [4], then the same cell is at a different relative position in region [5], and at yet a different relative position in region [9]. In particular, if you consider the value at relative position i,j in region [4], then the corresponding slot in region [5] is one over and one up, or relative to region [4] it is at position i-1,j-1. Similarly, the corresponding slot in region [9] is one up at position i-1,j. Essentially, these two views allow you to indicate that for any i and j, a[i,j]=a[i-1,j-1]+a[i-1,j], which is the definition of the binomial coefficients. To complete the definition, the result parameter [11] references the bottom region [10] of the bottom-right view. Particularly interesting about this example is that:

There are no indexed expressions.
There are no loops or, indeed, any control structures at all.
It's completely flexible in that it will work for any polynomial degree n.
The expressions are simple and intuitive.

Eight Queens Problem

Finally, consider the eight queens problem, where the objective is to place eight chess queens on a chessboard in such a way that none are under attack. As queens can attack at any distance on any diagonal, column, or row originating from their current position, the objective is essentially to place the eight queens such that no two are in the same column, same row, or same diagonal.

A typical solution is to recursively place one queen in each column starting from the first column. In each column, you try to place the queen in some row such that it does not attack any previous queen. If you find such a place, then you try to place the rest of the queens in succeeding columns. If you are successful, you leave the queen where it is; otherwise, you look for another row, until you either succeed or run out of possibilities and fail, in which case some previously placed queen will have to be removed and placed again.

The solution requires a main function, called place-queens, and three predicate functions to check if a newly placed queen causes jeopardy to some previously placed queen on the diagonal up and left, the horizontal left, or the diagonal down and left. You begin with the main function.

The main function, place-queens ([1], [2]), expects that [2].col queens have already been placed and attempts to place a queen in column [2].col+1. The value of [1] will determine where you attempt to place the new queen. As you continue, you will either continue to try to place a queen in the current column, recursively incrementing [1], or to place a new queen in the next column restarting with [1] being 1. You partition a view of [2] into [3], [4], and [5] using the value of [1] to set the bound equations. You create a new column array and partition it likewise, giving [6], [7], and [8], with composite view [9]. You then create a new array and partition it in into two vertical regions, [10] and [11], with composite view [12]. By assigning [10] to be [2] and [11] to be [9], this new array is automatically sized and becomes the append of [2] and [9]. Next, you create a new array [13] and assign it a value. This value is conditionally the result of recursively calling place-queens, provided the just placed queen is safe (that is, it is not threatening a previously placed queen); unless all queens have now been placed, in which case [13] assumes the value of [12]. If the just placed queen is not safe, the value becomes the null array (a 0x0 array). Finally, if [13] failed to compute a nonnull value, then [14] takes the value of recursively calling place-queens with [1] incremented by 1, unless all row positions have already been tried. This recursive call may fail, in which case the value of [14] is null. The final [14] value is the result of the function.

Either it will be a completed solution or null, in which case an enclosing nested recursive call will have to try a different solution. The solution can be seen in Figure 8. Figure 9 finishes the problem with the solution to the three required predicate functions. In each case, an array is input. This array is partitioned to create a region of exactly one cell. This one cell is compared for containing a queen. If present, the predicate returns False; otherwise, a recursive call is made until an empty array results.

This demonstrates that not all problems can be expressed completely without indexing dependencies, but that unnecessary indexing considerations can be eliminated. In this problem, attempting to place a queen in a particular row is a necessary part of the problem solution. In particular, recursion is based on setting this value and incrementing it. On the other hand, all indexing normally associated with writing the predicate safe functions has been eliminated, yielding rather elegant solutions.

Conclusion

By employing a visual representation that allows identifying parts based on patterns, program notation is greatly simplified. In addition, it becomes possible to express many algorithms on whole regions, thereby eliminating much control code in favor of leaving it to the systems to organize control to accomplish the specified operations. I encourage you to contemplate your favorite language with similar visual representation.

Acknowledgments

This work was supported in part by grants from Kansas Technology Enterprise Corporation (KTEC) and by the following National Science Foundation grants: NSF Grant CISE-CDA-9401021 and NSF Grant CISE-IRI-9616242.

DDJ