August 1995/Image Processing in C, Part 13: Geometric Operations

Graphics

Image Processing in C, Part 13: Geometric Operations

Dwayne Phillips

Dwayne Phillips works as a computer and electronics engineer with the U.S. Department of Defense. He has a PhD in Electrical and Computer Engineering from Louisiana State University. His interests include computer vision, artificial intelligence, software engineering, and programming languages.

Introduction to the Series of Articles
This article is thirteenth in a series of articles on images and image processing. Previous articles discussed reading, writing, displaying, and printing images (TIFF format), histograms, edge detection, spatial frequency filtering, sundry image operations, image segmentation, working with shapes, and Boolean operations (see sidebar for brief index). This article discusses geometric operations.
A previous article in this series [1] discussed image operations that included simple 90-degree rotation and basic image zooming and shrinking. This article covers more powerful forms of these operations.
You'll encounter several equations in this article, but do not be dismayed. I've included them here for people who really get into math. (All equations are shown in the box "Equations for Geometric Operations.) If math is not your specialty, you need only refer to the figures and images to understand the subject.

Geometric Operations
Geometric operations change the spatial relationships between objects in an image, by moving objects around and changing their size and shape. Geometric operations can rearrange an image to help us see what we want to see a little better.
The three basic geometric operations are displacement, stretching, and rotation. A fourth, less common operation is the cross product (included here to show how to distort an image using higher order terms).
Displacement simply moves an image as a whole in the vertical and horizontal directions. Stretching enlarges or reduces an image in the vertical and horizontal directions. Rotation rotates an image by any angle. Figure 1 illustrates these three operations.
Equations (1) and (2) show the mathematics behind these operations [2]. The first two terms in each equation perform the rotation by any angle q. The x_displace and x_displace terms perform displacement. They shift the image in either horizontal or vertical directions respectively (left for x_displace > 0, right for y_displace < 0). The x • x_stretch term enlarges or shrinks the image in the horizontal direction while y • y_stretch does the same in the vertical direction. The x_cross and y_cross terms distort the image. The example in Photograph 1 illustrates this last operation.
Setting x_cross and y_cross to anything but 0.0 introduces nonlinearities (curves), because equations (1) and (2) multiply the terms by both x and y. In Photograph 1 the input image appears on the left side, the output is on the right. I created this image with x_cross and y_cross = 0.01. Values much bigger than this distort the output image to almost nothing.
Equations (1) and (2) are powerful because they can do all four operations at once. Assigning values to the terms q, x_displace, y_displace, x_stretch, y_stretch, x_cross, and y_cross determines the extent of modification that these operations will perform throughout the image.
Using higher order terms in equations (1) and (2) can cause greater distortion to the input. You can add a third order term to equation (1) (x*x*y*x_double-cross) and equation (2) (y*y*x*y_double-cross). Try this for homework. It will be easy given the source code below.
Listing 1 shows the geometry routine that implements these operations. geometry has the same form as the other image processing operators in this series. The operator's parameters come directly from equations (1) and (2). The first section of code converts the input angle theta from degrees to radians and calculates the sine and cosine. The next section modifies the stretch terms if necessary to prevent division by zero.
The heart of geometry is a pair of (nested) for loops, referred to here as "the loop." The loop essentially functions as a glorified pixel copier, with an added feature to eliminate "jaggies." For each output pixel location (i, j) the loop calculates a corresponding location in the input image (tmpx, tmpy) from which to grab a pixel. Chances are that tmpx and tmpy will not be integer values, so the loop must either interpolate or truncate to the nearest pixel.
If variable bilinear == 1, the loop calls the bilinear interpolation function described below. Bilinear interpolation eliminates jaggies. If bilinear == 0, the loop performs an implicit truncation new_j = tmpx; new_i = tmpy;) and sets the output image directly. In this case the compound if statement checks if the new points are inside the ROWSxCOLS array. If they are not, the output pixel gets the FILL value (this fills in vacant areas).

Rotation About Any Point
The geometric operations above can rotate an image, but only about the origin (upper left-hand corner). Another type of rotation allows any point (m, n) in the image to be the center of rotation. Equations (3) and (4) describe this operation [3].
Figure 2 illustrates how the input image (the rectangle) revolves about the point (m, n). Almost anything is possible by combining the basic geometric operations shown earlier with this type of rotation. For example, you can displace and stretch an image using the earlier operations and rotate that result about any point.
Listing 2 shows the routine arotate that performs rotation about any point (m, n). After creating the output image (if needed), arotate converts the angle of rotation from degrees to radians and calculates the sine and cosine. arotate loops through the image and calculates the new coordinates tmpx and tmpy using equations (3) and (4). If bilinear == 1, arotate uses bi-linear interpolation (coming up next). Like geometry, arotate uses either bilinear interpolation (smooth output) or truncation (jaggies) when copying pixels from input to output.

What is Bi-Linear Interpolation?
Bi-linear interpolation is a process that calculates an estimated gray level for a location between two or more pixels. In other words, it fills in holes with gray levels that make sense [4, 5]. Bi-linear interpolation is a component of almost any visual production that requires image processing, including commercials, music videos, and movies. It's critical here to making the results of geometric operations look good.
The bent lines in photograph 2 show why bi-linear interpolation is important. The left side represents the results of simple truncation — no interpolation — and is replete with jagged lines. The smooth bent lines on the right side result from bi-linear interpolation.
What causes jagged lines? In geometric operations, which copy pixels from a source location to a destination, the source location often lies somewhere between input pixels. These source coordinates could be, say, x = 25.38 and y = 47.83. Which gray level should be assigned to that location? If the operation truncates it will copy the gray level from the pixel at x = 25 and y = 48. (That is what happens in code Listing 1 and Listing 2 when the parameter bilinear == 0.) Truncation produces the jagged lines.
Bi-linear interpolation removes jagged lines by finding a sensible gray level between pixels. (Interpolation finds values between pixels in one direction; bi-linear interpolation finds values between pixels in two directions, hence the prefix bi.)
Figure 3 shows how to perform bi-linear interpolation. Point P3 (x, y) falls somewhere between the pixels at the four corners. The four corners are at integer pixel locations (x = 25, x = 26, y = 47, y = 48). Equations (5), (6), and (7) find a reasonable level for point P3. (In these equations x and y represent fractional values between 0 and 1.) For the pixel location x = 25.38 and y = 47.83, x and y as shown in the equation will be 0.38 and 0.83, respectively. Equation (5) finds the gray level of point P1 by interpolating between the two upper corners. Equation (6) finds the gray level of point P2 by interpolating between the two lower corners. Equation (7) finally finds the gray level of P3 by interpolating between points P1 and P2.
Listing 3 shows the routine bilinear_interpolate that implements these equations. The input parameters are the entire image array the_image and the location (x, y). bilinear_interpolate returns an estimated gray level for location x, y. This routine contains slow, double-precision floating-point math, clearly showing the trade-off between truncation and interpolation — speed vs. good looks.
Bi-linear interpolation is a simple idea, uses a simple routine, and makes a world of difference in the output image. I recommend the truncation method for quick experiments and bi-linear interpolation for final presentations.
Listing 4 shows a stand-alone program that allows the user to work on entire images at a time. This program produced the images shown in this article.

A Stretching Program
I've included a useful stretch utility for the code disk (not listed here) that will enlarge and shrink an entire image. The utility has many uses including fitting an image to a display screen for printing or photographing and making two images nearly equal size for comparisons. An earlier article in this series [1] showed how to perform a very simple, limited form of enlarging and shrinking. This form would only enlarge or shrink by an integer factor. The stretching and bi-linear interpolation tools now available permit general stretching, by non-integral scale factors.
Photograph 3 shows sample output from the stretch program. It demonstrate how stretch can enlarge in one direction while shrinking in another.

Summary
This installment in the series discussed geometric operations. These powerful and flexible operations change the relationships, size, and shape of objects in images. They allow you to manipulate images for better display, comparison, or just for fun. Keep them handy in your collection of tools.

References
[1] Dwayne Phillips. "Image Processing, Part 8: Image Operations," The C Users Journal, November 1992, pp. 89-116.
[2] Kenneth R. Castleman. Digital Image Processing (Prentice-Hall, 1979).
[3] David F. Rogers and J. Alan Adams. Mathematical Elements for Computer Graphics (McGraw-Hill, 1976).
[4] John C. Russ, The Image Processing Handbook (CRC Press, 1992).
[5] Christopher Watkins, Alberto Sadun, and Stephen Marenka. Modern Image Processing (Academic Press, Cambridge, 1993).