March 2000/Taming the 3-D Perspective Transform

Features

Taming the 3-D Perspective Transform

Alex Telford

Showing objects in perspective requires only simple math, but a little not-so-simple strategy makes for a friendlier display.

Some of the most spectacular computer generated images are of imaginary three-dimensional worlds, with parallel lines disappearing to a vanishing point on the horizon. The math required to generate these views is surprisingly simple — just multiply the 3-D point in space by a single 4x4 matrix; rescale, and out pops a 2-D point suitable for plotting on-screen. Nowhere else does a little investment in understanding the math pay off so well.

The book Computer Graphics: Mathematical First Steps [1] is a good place to start. It represents an innovation in the teaching of the math required for rendering 3-D scenes. This book assumes almost no prerequisite math knowledge, but takes the reader to a working knowledge of the single 4x4 matrix transform that translates a 3-D position point into a 2-D point in a perspective view.

I have incorporated this transform into a Windows program called Scen, an instrumented workbench for 3-D perspectives. This program illustrates the working of the single matrix transform. The program allows the user to vary the eight parameters that specify a view. It then shows the user the resulting view, the viewport coordinates, and the transform matrix. Scen uses only one graphics display device function, SetPixel(x, y, color). Scen.exe is available for download from the CUJ ftp site (see p. 3 for downloading instructions).

Some years ago, working from a rather terse description of the math involved, I found that changing the view parameters often caused the image to go off the screen. More recently, the above-mentioned book has helped take much of the mystery out of the single matrix transform. I see now that the key to a successful display is understanding the way the viewport moves when any of the view parameters are changed. In this article I explain how the 3-D perspective transform works, with the hope that I can help readers who are experimenting with 3-D graphics.

World and Homogeneous Coordinates

The ideal system for programmers working in 3-D is one that allows them to represent objects in space in a natural way — each point that makes up the object has x, y, and z coordinates, unrestricted in any way. In a typical use of such a system, programmers might specify objects whose points each had coordinates in the range -25.0 to 25.0, with the point (0.0, 0.0, 0.0) representing the origin, which would normally lie at the center of the object world. This is a description of a world coordinate system. In one common layout, the x-axis is horizontal, with left corresponding to negative x values, right corresponding to positive; The y-axis is vertical, with up corresponding to positive; and the z-axis has positive values coming out of the screen, negative going back into the screen toward the horizon. The horizon is then parallel to the x-axis and is at a z coordinate of minus infinity.

The single matrix perspective transform provides this natural 3-D representation, but adds a mathematical trick that simplifies the transform operation to a single matrix multiply. The trick consists of adding a fourth coordinate to (x, y, z) that is always equal to 1.0. So the 3-D point (x, y, z) becomes (x, y, z, 1). This makes the math for perspective transforms much simpler.

The matrix that performs the perspective transform must match in size what is now a four-element point in 3-D. This matrix is square, of size 4x4. To find the screen position of the world point (x, y, z, 1), multiply it by the transform matrix to produce a transformed point (u, v, 0, w). Rescale this point so that w is made equal to 1.0: (u/w, v/w, 0, 1). The point (u/w, v/w) is where the original 3-D point maps onto the viewport plane, which is at z = 0. This new 2-D point must be offset and scaled to fit the display device viewport.

The elements of the 4x4 perspective matrix are just the view point distance and the rotation angle. Figure 1 illustrates the elements of the perspective transform matrix. This matrix corresponds to a world of objects that have been rotated through an angle a (in radians) about the y-axis. The value of r is equal to -1/dz, where dz is the positive distance of the eye or camera along the z-axis away from the viewport.

Camera and Object Worlds

The transform matrix in Figure 1 represents the perspective view seen by a camera or eye positioned on the z-axis at a distance dz along the positive side of the z-axis. The contents of this matrix are also based on the assumption that the Object world is being projected onto a screen, or viewport, in the x-y plane at z = 0. The camera and the projection screen are in a coordinate system called the Camera world. This is the coordinate system where the transform matrix is working. To get a satisfactory image of the Object world projected onto the screen at z = 0, the Object world must be positioned on the other side of this screen — that is, all of its z values should be negative. This situation is depicted in Figure 2. The camera will see good perspective views when the Object world is as far in the negative z direction from the screen as the camera is in the positive z direction.

There are limits as to which of the view parameters can be safely incorporated into the perspective matrix while preserving its simplicity. For example, it is best to avoid specifying the z-offset of the Object world within the perspective matrix. The offsets of the Object world coordinates, which are required to place it in a position where reasonable views are possible, are best made to each object's position point as shown in Figure 3, just before the matrix transform is applied.

Each of the elements of m_ScnObjCoords represents a coordinate in the Camera world coordinate system. Each coordinate is the sum of two variables: an Object world coordinate and a variable that offsets the entire Object world to a place that is viewable by the camera. Once the Object world offsets are set up, the programmer can mostly forget about these values and work in a natural Object world coordinate system with the origin (0, 0, 0) at the center.

Surprising as it may seem, this transform matrix and the equations that set up the input 3-D point are all that are necessary to define a perspective view of the point. Only five parameters are required: the z view distance and the angle of rotation of the Object world are contained in the matrix; the three offset variables for x, y, and z are in the setup equations for the point to be transformed. If you matrix multiply the input 3-D point (after offsetting) by the transform matrix, a 2-D point on the viewport screen at the z = 0 plane is the result. All this work is done in floating-point variables. Conversion to integers is done last for the pixel display device.

Of course, this 2-D point is still really in the Camera world coordinate system. Its range of possible values is influenced by all the five perspective values plus the sizes of the Objects and their distances from the Object world origin. The Camera world viewport must now be scaled and offset to fit the display device viewport. For example, on the Scen program window a large rectangle is set aside for the viewport; it occupies most of the application window, which is used at full-screen. The rest of the window is for control buttons.

Taming the View with Negative Feedback

At this stage of the process an engineering trick is needed, otherwise programmers will find that their displays are often blank. The problem is that the Camera world viewport jumps about in Camera space apparently unpredictably as the view parameters are varied by even small amounts. Generally, the Camera world coordinates and the Object world sizes are in the range -50.0 to +50.0 as a matter of convenience. Reasonable perspective views shrink the Object world to an image half or less of the Object world size. The program Scen assumes full-screen is 1024x768 pixels. As a result, a screen Gain parameter is required, which will typically be about 100, but may range up to 800 or more. These high gains amplify the viewport shifts.

An engineer faced with a black box with seemingly unpredictable output will use a technique known as negative feedback to tame the output. This technique applied to the perspective transform solves the problem of viewport shift. I decide on a point in my Object space that I wish always to appear on-screen. I want to force this point to display at a fixed point of my display device viewport. Say this point is to be shown at coordinate (Xo, Yo) in the device viewport, independent of the screen gain. A useful Object world point to fix on-screen is the origin (0, 0, 0). Once all five parameters of the perspective transform are set, the input 3-D point (0, 0, 0) will transform to a 2-D point on the Camera world viewport at (So, To). My transform from Camera world viewport to device viewport is:
Ix = Gain*(Sx - So) + Xo;
Iy = Gain*(Ty - To) + Yo;
where Ix and Iy are device coordinates, (Sx, Ty) is the Camera world point of a transformed 3-D point, and the Object world origin (0, 0, 0) is always at (Xo, Yo) independent of the value of Gain.

Initializing a View Set

A view set refers to all the parameters that descibe the placement and orientation of the camera, the placement and orientation of the Object world, and the object sizes, with respect to the x-y plane at z = 0. The class CScenView, shown in Figure 4, encapsulates the important parameters of a view set, and provides a function PerspectiveTr to transform Object world coordinates to Display device coordinates.

An example of a reasonable view set is the View Set A from the program Scen (see Figure 5). Four objects exist in this world: a set of axes along the x, y, and z directions, a grid in the x-z plane, a box, and a tank. This entire world sits on the grid object, which has its front left corner at the Object world origin (0, 0, 0) and its back right corner at (12, 0, -24). The box object is 1.0 units on a side and has its lower left front corner at (5, 0, -10). A cylindrical tank with its center at (2, 0, -5) has radius 1.0 units and height 0.5 units. You can think of the units as feet or yards of meters according to choice; the math is concerned only with the ratios of the various parameters.

Perhaps the most important parameter to decide on after the object sizes are fixed is the "y drop." This parameter drops the whole Object world down in the y direction so that the tops of the objects can be seen by the camera. For this view set, y drop is set to -6. Next in importance are the pair of parameters z view point and z offset. The view point here is 10 units positive along the z axis toward the device screen. Then the whole object world is pushed back from the viewport screen by the same distance in the negative z direction: -10 units. In this simple view set, the world angle is set to zero; that is, the entire world experiences no rotation. To get lines converging to a vanishing point on the horizon, the whole world is also shifted left by making the x offset equal to -3 units. The screen gain is set to 100 for my display device, and the Object world origin is fixed on-screen at a point on the lower left of the screen, at 20% of the x and y device viewport sizes. Notice that the camera stays in a fixed position but the whole world is moved (along all the axes) relative to it.

At this point in the initialization process the transform matrix can be generated. To get the all-important viewport location all I do is feed the Object world origin (0, 0, 0) into the PerspectiveTr function and pick up the values of S and T, the corresponding Camera world viewport coordinates. This operation takes place within CScenView's member function PerspectiveInit (Figure 6). I set So and To to these values, which guarantees that the Object world origin will always be fixed on the device screen at 20% of the device viewport extents in x and y.

Establishing the Horizon

Any parallel lines in the Object world that recede into the distance converge, if long enough, on the horizon. A horizontal horizon is easy to find. Put the Object point (0, 0, -1,000,000) through the PerspectiveTr function (Figure 7) and store the device viewport pixel y value. If this value is within the device viewport, then a horizon is visible for that view set. The Scen program generates a "sky" by displaying a filled rectangle, the top of which is the same as the device viewport's top and the bottom of which is the horizon y value just obtained from PerspectiveTr.

The routines PerspectiveInit and PerspectiveTr of class CScenView are all you need to transform a 3-D point (x, y, z) to a 2-D point on the display device. Figure 6 (a partial listing of persinit.cpp) shows how the transform matrix is set up. Typical views require several millions of pixels to be rendered, so everywhere I can, I avoid passing parameters on the stack to gain speed. The variables required are publicly accessible class members.

The perspective transform function PerspectiveTr (Figure 7) is called for every point in the scene. This function performs one matrix multiply operation in nested 4x4 loops. Since many of the matrix elements are zero, this operation could be speeded up by unrolling the loops. Objects to be displayed are generated by sweeping real parameters through the range 0.0 to 1.0, with a step value small enough to paint all the pixels that will display.

Understanding the View

After you've written a program to use these routines, you must take some care in interpreting the effects of changing the view parameters. An initially surprising result, for example, is the effect of reducing the view distance of the camera on the z axis. This puts the camera nearer to the view screen, and leaves all other parameters the same. The image gets smaller — not perhaps the expected behavior. Figure 8 shows that this behavior is correct; the projection of the Object world should get smaller to stay within the cone of vision. (Note that in this aspect the Camera and Object world do not accurately model reality. If a real camera were in use, the viewport would be the part of the camera world the camera was focusing on and its image would be projected onto the camera's film plane.)

Reference

[1] Patricia Egerton and William Hall. Computer Graphics: Mathematical First Steps (Prentice-Hall, 1998). ISBN 0-13-599572-8.

Alex Telford graduated from Edinburgh with a BSc in Physics. He worked on early microcomputers in the field of psychometric testing, and later worked on medical imaging in a Unix environment. He currently operates Solarix Software, a company that offers systems developed in C++ on Windows for a select group of clients. He may be reached at a.telford@dial.pipex.com.