ALGORITHM ALLEY

Constrained Optimization

Rainer Storn

Rainer, who studied at the University of Stuttgart, received his PhD for his dissertation, "Algorithms and Architectures for the Discrete Fourier Transform for the Fast Convolution of Real-Valued Signals." Currently, he is a postdoctoral fellow at the International Computer Science Institute in Berkeley, CA and can be contacted at storn@icsi.berkeley.edu or rainer.storn@zfe.siemens.de.

Optimization problems appear in virtually every field of analysis, design, and production. For example, an electronic circuit contains numerous components (resistors, capacitors, and the like), the properties of which dither around a certain nominal value. This dithering is a direct result of the production process of these components and cannot be kept below a certain bound. Due to these imperfections, a certain percentage of electronic circuits don't meet the specifications. Here, the goal of optimization is to determine the nominal value of the critical components such that the production yield is maximized.

Another example might involve a city wanting to set up a data network that connects the computer centers of many companies and universities. The budget to build such a data network may not exceed a certain amount and depends, among other factors, upon the topology of the network, the data load introduced by the different computer sites, and the capacity of the connecting lines. The goal in this case might be to maximize the throughput in the network and minimize the average delay of a message, given the budget constraints.

Yet another example is an electric motor subjected to geometric constraints in order to fulfill certain company standards. There are some degrees of freedom to alter the shape of the ring magnet. Here, the goal is to alter the ring-magnet geometry in such a way that the efficiency of the motor is maximized.

In all these examples, an optimum must be found while operating within certain constraints. These kinds of problems fall under the heading of "constrained optimization." To optimize such problems using the computer, you must first formulate them mathematically. Many algorithms are capable of tackling constrained-optimization problems. Most require the problem to be formulated by means of an objective (or "cost") function that has to be minimized. Objective functions are a playground for deterministic algorithms, which often make use of gradient computations. Such algorithms, however, run into problems if the objective function is nondifferentiable. Furthermore, they usually look for an improvement after each step they perform and hence are liable to get stuck in a local optimum. Apart from that, devising a good objective function can cause a lot of headaches, especially if you have more than one objective.

For example, consider an integrated circuit for which both the power consumption and the signal delay have to be minimized. Minimizing the power consumption, however, often means minimizing the current, which in turn may increase the signal delay. In such a case, it is difficult to weight the different objectives properly.

Other optimization problems don't have anything to minimize or to maximize. They merely look for a set of parameters that fulfill certain constraints. With such problems, an artificial objective function must be formulated; minimizing this artificial function fulfills the constraints. An example is the magnitude of the transfer function of an electronic filter, which has to fit inside a tolerance scheme but can adopt any shape inside the tolerance scheme. Contriving a good objective function is far from trivial and more often than not requires considerable expertise in the details of the optimization problem. But what if you are no expert and your boss still wants you to do the optimization? The algorithm I propose here doesn't require the formulation of an objective function. You just formulate the constraints that must not be violated (must-constraints) and the properties that have to be maximized or minimized (may-constraints).

The algorithm is based on some new, as-yet-unpublished ideas. It belongs to the so-called Monte Carlo methods, which lavishly utilize random numbers. Due to the statistical nature of these methods, the chances are quite good for finding the global optimum, as opposed to getting stuck in a local one. The price is an enlarged computational effort in comparison to deterministic optimization procedures. In light of ever-increasing computing power, however, this disadvantage is vanishing for many practical problems. If large-scale optimization problems have to be dealt with, it is advantageous to distribute the calculations among many computers; an example is a high-speed computer network, to which the proposed optimization algorithm lends itself perfectly.

The Optimization Principle

Consider a system with n parameters, p₁, p₂, ..., p_n, which influence whether the system meets its imposed requirements. The parameters p_i can be interpreted as the elements of an n-dimensional parameter vector P. If the parameters are chosen appropriately, the system will meet the requirements, which can be visualized as the parameter vector P pointing inside a "region of acceptability" (ROA) in the parameter space. This scenario is illustrated in Figures 1 and 2 for an example with two parameters.

In Figure 1, the shaded areas constitute the must-constraints. The function f(x,p1,p2) is not allowed to run through, let alone touch, those areas. Choosing the parameters p1=p11 and p2=p21 makes the function fulfill the constraints. If you choose p1=p10 and p2=p20, the function violates the constraints. A may-constraint could be the requirement of f(x,p1,p2) to be as linear as possible for x in the range from 0 to 15000. For the sake of simplicity, I'll disregard may-constraints for the moment. The situation in Figure 1 can also be viewed in the parameter space defined by the coordinates p1 and p2 in Figure 2. The ROA in Figure 2 corresponds to the blank area in Figure 1. The parameter vector (p10,p20) causes the function in Figure 1 to violate the imposed constraints (the ROA is not hit). The parameter vector (p11,p21) causes the function in Figure 1 to satisfy the imposed constraints (ROA is hit). If f(x,p1,p2) fulfills the constraints, the parameter vector P=(p1,p2) lies inside the ROA. If f(x,p1,p2) violates the constraints, the parameter vector is located outside the ROA.

The ROA of a system can be quite complicated and usually isn't known in advance. If it were, it would be trivial to choose the right parameter values for the dimensioning of the system. The optimization process can be interpreted as a way of finding a parameter vector, P, which lies inside the ROA. The task now is to estimate (at least parts of) the ROA's shape by producing a cloud of statistically distributed parameter vectors in the neighborhood of your starting (or "nominal") vector. Vectors which hit the ROA provide the desired estimate of the ROA's shape (at least partly); see Figure 3.

Initially, the vector may be so far from pointing anywhere near the ROA that even hours of computing time will not produce a hit. To escape this trap, you initially relax the constraints just enough to allow your nominal vector to meet them. Now, Gaussian, distributed, random deviations from the nominal vector are generated. They produce a cloud of new parameter vectors that gather around the nominal vector, as in Figure 3.

The algorithm I'm proposing tries to generate 3*dim hits when dim parameters make up a parameter vector (in our example, dim=2). If this can't be achieved, the current iteration will stop after mxvecs tries. Setting mxvecs=20*dim is a reasonable starting point. If the optimization gets stuck in a local optimum or no hits are found, increase mxvecs. At the end of an iteration the mean value of the hit vectors is calculated and defined to be the new nominal vector for the next iteration. Before the next iteration is started, the constraints are tightened, but only to the point that the new nominal vector lies on the rim of the new ROA. The optimization algorithm continues with this mechanism until the original constraints are met. The entire optimization process will be accompanied by a successive shrinking of the ROA; see Figure 4. Note that the ROA need not maintain its general shape, as might be suggested by Figure 4. In fact, the shape can change significantly, and this sometimes hinders the algorithm in finding the desired optimum.

Although the idea behind the proposed optimization procedure is fairly simple, you need a bag of tricks to make it actually work. A central problem is finding the proper standard deviations to drive your multivariate, Gaussian, random-number generator. It is good to start the program with large standard deviations; for example, three times the corresponding parameter value itself (unless the value is 0, of course). During an iteration of the optimization process, the maximum distances in each coordinate direction between the current nominal vector and the hit vectors have yielded good values for the standard deviations of the next iteration. In Monte Carlo optimization, a major issue is preventing the standard deviations from becoming too small. Otherwise, the program will run into the next-best local optimum as quickly as it can. Therefore, my proposed algorithm tries to increase the maximum deviations by reusing the difference vector of the nominal vector and a hit vector. Such a "successful" difference vector is applied as often as possible--as long as new hits can be generated. At the first occurrence of a no-hit, the mechanism stops and the maximum deviation in each coordinate direction is evaluated. The reusage strategy shown in Figure 5 is implemented via a linear-search strategy (an input-file that illustrates this is available electronically; see "Availability," page 3). Of course, the search strategy could be improved to get closer to the rim of the ROA faster, but this has been left out of the current implementation for the sake of simplicity.

Although the standard deviations should remain large enough to increase the chance of leaving local optima, the algorithm will produce no hit at all if the deviations are too large. Hence if no hit occurs, the standard deviations are reduced by a factor of 0.7 for the next iteration. The factor 0.7 is not mandatory; you can experiment with it.

On one hand, the control strategy just described helps the program reduce its standard deviations when it has to ooze through a narrow valley of the ROA. On the other hand, the strategy expands them again when the valley has been passed. It is this control strategy that is new, at least compared to existing Monte Carlo optimization methods; control is crucial for the robustness of the algorithm.

There are other pitfalls to be considered. The larger the problem size (that is, the more parameters you want to include in your optimization), the more difficult it becomes to generate hits because there are so many different vectors to choose from. To alleviate this general problem, it helps to keep a record of successful search directions from the current iteration. These directions are the first ones to be tested in a new iteration before the random-number generator takes over. This assumes that the shape of the ROA doesn't change too much from one iteration to the next; hence, the desired number of hits can be generated faster by "looking into the right direction." In fact, this assumption has turned out to be true in many cases.

Now let's look at must-constraints and may-constraints. Try to meet your must-constraints first. After that you can start to work on the may-constraints while maintaining the must-constraints. There is a little problem concerning the criterion for making the whole optimization come to an end. If the nominal vector doesn't change much over several iterations, you could stop the program. This, however, can result in aborting the optimization during a stagnation phase. A sure-fire criterion is to stop the program after a prescribed maximum amount of iterations and see if you are satisfied with the results. But then you might be satisfied with results that could have been surpassed significantly had you just waited long enough. As a third variant, you could set up a fixed goal for the may-constraints and stop when this goal is reached. However, you often don't know whether this goal can be reached at all. Maybe it's impossible to satisfy even the must-constraints. From the previous discussion, it is clear that each criterion has its drawbacks. As a compromise, the last two criteria are implemented in Listing One , leaving the decision up to you whether you go for one or more restarts to improve your results.

Examples

I'll now examine some practical examples, starting with one taken from electronic-filter theory. Consider a polynomial p(x) of degree 4 that must fit the tolerance scheme in Figure 6 such that the polynomial does not run through any parts of the shaded area. The polynomial itself can take on arbitrary shapes inside the tolerance scheme, which makes the whole problem exhibit only must-constraints but no may-constraints. An arbitrary starting solution for p(x) is p0(x)=1.0+1.0*x+0.2*x²+0.0*x³+0.0*x⁴ (which, by the way, is not a "solution" because it violates the tolerance scheme). After several iterations, the statistical-optimization program arrives at the true solution, which can also be seen in Figure 6.

Now consider an example from operations research, which uses the variables x and y with the constraints: maximize x+y while keeping (x-3)²+(y-2)²16 and x*y14, as well as x,y>0. The maximization constitutes a may-constraint, while the others are must-constraints. The optimization should render the result x=7 and y=2.

In the next example (from computer science), the task is to approximate the function f(x)=sin(x) by a polynomial in the range x from 0 to /2; see Figure 7. The approximation should be equally good over the entire range of x. Because of speed requirements, the degree of the polynomial is restricted to 3. Most mathematicians will tell you to perform a Chebyshev instead of a Taylor approximation. In fact, you don't need to because the optimization finds the pertinent polynomial coefficients for you. The mean squared error (mse) taken at, say, 100 discrete points in the range x from 0 to /2, provides a sensible measure for the quality of the curve fit. Actually, the mse is nothing but an objective function that one tries to minimize. Although our optimization algorithm doesn't need objective functions, it can deal with them by regarding them simply as a may-constraint. Since you usually don't know which value the may-constraint will take on when the global optimum has been found, it makes sense to choose a "goal value" for the may-constraint that can never be reached and to control the termination of the optimization via the number of allowed iterations.

The last example is a capacity-assignment problem taken from computer networking. In Figure 8, programmable concentrators in each of the five boroughs of New York City are to be connected to a large computer in lower Manhattan. Teletype terminals are in turn connected to the concentrators. Each terminal transmits at a rate of 100 bits per second (bps). Messages are, on the average, 1000 bits long and exhibit exponential length distribution. A typical terminal transmits a message on the average of once a minute according to a Poisson process. The number of terminals in each borough is: Bronx, 10; Brooklyn, 15; Manhattan, 20; Queens, 10; and Richmond, 5. Given that the minimum average time delay should not exceed 4 seconds, the task is to find the optimum capacity assignment for each link that yields minimum cost. The underlying cost function, K, is assumed to be the equation in Example 1(a), where C_i is the line capacity that connects site i with the central computer and l_i is the corresponding line length. The partial cost, K_i, is defined as Example 1(b) with Example 1(c) denoting the step function.

The step functions in K_i(C_i,l_i) model fixed costs, repeater costs that appear after 12 km of line length and a cost increase due to a change of the line type for line capacities greater than 400 bps. (Queuing is the basic theory required for solving this problem; see Schwartz's Computer Communication Network Design and Analysis.)

There are many constraints in this nonlinear optimization problem--the C_i and the queuing delays must all be positive, and the minimum average time delay must not exceed four seconds--which makes my proposed algorithm suitable. The result after rounding off to integer values is:

C1=792 bps
C2=397 bps
C3=576 bps
C4=400 bps
C5=248 bps

The MCO.C Program

Listing One is an excerpt from mco.c, the program that implements the Monte Carlo optimization. While the complete program includes the functions rnd_uni(), rnd_gauss(), accept(), and main(), this excerpt illustrates only the accept() function, which describes your optimization problem and must be rewritten each time a new problem has to be solved. rnd_uni() is a basic random-number generator which yields equally distributed random numbers in the range between 0 and 1. rnd_gauss() makes use of rnd_uni() to construct a vector of dim elements with each element being a random number stemming from a Gaussian distribution with zero mean and unit standard deviation. Finally, the function main(), which performs the general optimization strategy, reads the input file and writes the results to the output file. Listing Two is the input file for the filter design in Figure 6, while Listing Three is the output file. The complete source code, executables, and sample data files are available electronically.

The input file reads the variable itermax, which defines the maximum number of iterations. It also contains the variable mxvecs, which defines the maximum allowable number of parameter vectors to be generated in one iteration. As mentioned earlier, it is reasonable to choose mxvecs=20*dim as a default value with dim being the dimension of the parameter vector--the number of parameters that you want to consider in your optimization problem. dim is the third item in the input file, followed by the variable reduct, the factor by which all standard deviations are multiplied if no hit is achieved during the current iteration. The standard value of reduct is 0.7. The next items are the dim starting values of your initial parameter vector, followed by the dim corresponding standard deviations. There is no generic requirement for choosing those values except that the standard deviations shouldn't be set to 0. If the optimization problem is not too difficult, the program will find the solution from virtually any starting point. Nevertheless it helps to decrease computing time if the starting vector is as close as possible to the expected vector and if the standard deviations are not too small. Choose standard deviations in the range of the corresponding parameter values for sound initial operating conditions. To call the program, type: mco <input-file> <output-file>.

Conclusion

The optimization program introduced here is suited for many constrained optimization problems. You just have to provide a mathematical formulation of your constraints, but you don't need to worry about devising an appropriate objective function. You can then let the program run and do the work for you. Of course, there are problems for which virtually no optimization strategy will find an easy answer; when the ROA consists of many disjunct islands, for instance. The only chance then is to try different starting vectors to find the optimum.

References

Lueder, E. Optimization of Circuits with a Large Number of Parameters. AEÜ, Band 44, Heft 2, 1990.

Kjellstrm, G. and L. Taxen. "Stochastic Optimization in System Design." IEEE Trans. CAS (July 1981).

Kreutzer, H. "Entwurfszentrierung zur Erhöhung der Ausbeute und zur Verbesserung der Eigenschaften von Schaltungen." Dissertation, University of Stuttgart, 1984.

Eckstein, T. "Statistische Optimierung von Systemen mit einer hohen Zahl von Parametern." Dissertation, University of Stuttgart, 1989.

Moebus, D. "Algorithmen zur Optimierung von Schaltungen und zur Lösung nichtlinearer Differentialgleichungen." Dissertation, University of Stuttgart, 1989.

Press, W.H. et al. Numerical Recipes in C. Cambridge: Cambridge University Press, 1992.

Schwartz, M. Computer Communication Network Design and Analysis. Englewood Cliffs, NJ: Prentice Hall, 1977.

Figure 1 The goal for function f(x,p1,p2) is to stay apart from the blue-shaded areas that constitute the must-constraints. Figure 2 The situation in Figure 1 can also be viewed in the parameter space, the coordinates of which are p1 and p2. Figure 3 Estimation of the ROA by a cloud of random parameter vectors which gather around the nominal vector. Figure 4 With each iteration of the optimization process, the ROA shrinks until the final constraints are met. Figure 5 The reusage of successful directions helps investigate the ROA more thoroughly and better estimate the standard deviations of your random-number generator. Figure 6 Polynomial of degree 4 before and after optimization. Figure 7 Best-fit approximation of f(x)=sin(x) in the interval [0, _/2] by a polynomial of the third degree. Figure 8 Topology of the data network in New York. Example 1 (a) Underlying cost of function K; (b) partial cost of Ki; (c) step function of 1(b).

Listing One


int accept(int dim,float par_vec[],int adapt,int *finish_ptr,
                                                float constraint[],int *nocont)
/**C*F************************************************************************
** SRC-FUNCTION   :accept()                                                 **
** LONG_NAME      :accept                                                   **
** AUTHOR         :Dr. Rainer Storn                                         **
** DESCRIPTION    :accept() tests whether a parameter vector par_vec[]      **
**                 falls into the region of acceptability (ROA). If it does **
**                 hit=1 is returned, otherwise hit=0.                      **
** FUNCTIONS      :none                                                     **
** GLOBALS        :none                                                     **
** PARAMETERS     :dim          number of vector elements.                  **
**                 par_vec[]    contains the vector with dim                **
**                              gaussian distributed variables.             **
**                 adapt        control variable.                           **
**                              0 : no adaptation of ROA                    **
**                              1 : adaptation of ROA permitted (shrinkage) **
**                              2 : adaptation of ROA permitted             **
**                                  (relaxation, expansion)                 **
**                 finish_ptr   indicates meeting of requirements.          **
**                              0 : goals not yet reached.                  **
**                              1 : goals has been reached.                 **
**                 constraint[] contains current constraints of ROA.        **
**                 *nocont      contains number of constraints.             **
** PRECONDITIONS  :par_vec[] must contain valid parameter vector.           **
** POSTCONDITIONS :Elements of constraint[] will probably be altered.       **
**                 nocont returns number of constraints to assist           **
**                 printing routine of main().                              **
***C*F*E**********************************************************************/
{
  int   hit, i, n;
  float goal[NARY], z0, z1, x, y;
/*------Set up constraints----------------------------------------------*/
/*------goal[0] : Maximum absolute value of polynomial for x ex [0,1]-----
--------goal[1] : Maximum absolute value of polynomial for x = +/- 1.2--*/
  goal[0] = 1.001;    /* must-constraint */
  goal[1] = 5.9;      /* must-constraint */
  *nocont = 2;        /* two constraints */
/*------Calculate function values and initializations-------------------*/
/*------Passband. Compute maximum magnitude of ordinate value.----------*/
  z0 = 0.;
  for (i=0; i<=100; i++)
  {
    y = 0.0;
    x = -1.0 + (float)i/50;
    for (n=dim-1; n>0; n=n-1)
      {
        y = (y + par_vec[n])*x;
      }
    y = y + par_vec[0];
    if (fabs(y) > z0) z0 = fabs(y);  /* z0 contains maximum magnitude */
  }
/*--------Stopband. Compute ordinate value at the edges x = (+/- 1.2).--*/
/*--------Save the lowest ordinate value.-------------------------------*/
  y  = 0.0;
  x  = 1.2;
  for (n=dim-1; n>0; n=n-1)
  {
    y = (y + par_vec[n])*x;
  }
  y = y + par_vec[0];
  z1 = y;
  y = 0.0;
  x = -1.2;
  for (n=dim-1; n>0; n=n-1)
  {
    y = (y + par_vec[n])*x;
  }
  y = y + par_vec[0];
  if (y < z1) z1 = y;
  hit  = 1;   /* Preset hit-flag to "hit" */
/*------Relax constraints if adapt equals 2.---------------------------------*/
  if (adapt == 2)
  {
    for (i=0; i<=1; i++)  /* Initialize constraints to goal values */
    {
      constraint[i] = goal[i];
    }
/*--Relax must-constraints as required---------------------------------------*/
    if (z0 > constraint[0]) constraint[0] = z0;
    if (z1 < constraint[1]) constraint[1] = z1;
  }
  else if (adapt == 1)    /*--------adapt must-constraints (shrinkage)-------*/
  {
    if (z0 <= constraint[0])
    {
      if (z0 > goal[0]) constraint[0] = z0; /* adapt must-constraint only if */
      else constraint[0] = goal[0];           /* goal is not reached yet.    */
    }
    else
    {
      hit = 0;
    }
    if (z1 >= constraint[1])              /* adapt must-constraint only if */
    {                                     /* goal is not reached yet.      */
      if (z1 < goal[1]) constraint[1] = z1;
      else constraint[1] = goal[1];
    }
    else
    {
      hit = 0;
    }
  }
  else
  {
    if (z0 > constraint[0]) hit = 0;
    if (z1 < constraint[1]) hit = 0;
  }
  *finish_ptr = 0;
  if ((constraint[0] <= goal[0]) && (constraint[1] >= goal[1])) *finish_ptr=1;
  return(hit);
}

Listing Two

Listing Three


Iteration: 134
---------------
xnominal[0] = 0.993562
xnominal[1] = -0.043474
xnominal[2] = -7.870154
xnominal[3] = 0.050184
xnominal[4] = 7.859358
sigma[0]    = 0.012818
sigma[1]    = 0.009166
sigma[2]    = 0.035221
sigma[3]    = 0.014853
sigma[4]    = 0.040397
constraint[0]    = 1.001000
constraint[1]    = 5.900000
Number of random points = 100
Number of hits          = 3
Yield in percent        = 3.000000