Study On Optimization Methodology English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Optimization in engineering refers to the process of finding the best possible values for a set of variables for a system while satisfying various constraints. The term "best" indicates that there are one or more design objectives that the decision maker wishes to optimize by either minimizing or maximizing them. For example, one might want to design a product by maximizing its reliability while minimizing its weight and cost. In an optimization process, variables are selected to describe the system such as size, shape, material type, and operational characteristics. An objective refers to a quantity that the decision maker wants to be made as high (a maximum) or as low (a minimum) as possible. A constraint refers to a quantity that indicates a restriction or limitation on an aspect of the system's capabilities. [1]

Generally speaking, an optimization problem involves minimizing one or more objective functions subject to some constraints, and is stated as: [1]

Minimize {fI (X),f2 (x), ... ,fM (x)} ……………………………………( 4-1 )

Where fI, i = 1, ... , M, are each a scalar objective function that maps a vector x into the objective space. The n-dimensional design (or decision) variable vector x is constrained to lie in a region D, called the feasible domain.

An optimization problem in which the objective and constraint functions are linear functions of their variables is referred to as a linear programming problem. On the other hand, if at least one of the objective or constraint functions is nonlinear, then it is referred to as a nonlinear programming problem. (1)

4.2- Unconstrained Nonlinear Optimization

There are four general categories of Optimization solvers: (1)


This group of solvers attempts to find a local minimum of the objective function near a starting point x0. They address problems of unconstrained optimization, linear programming, quadratic programming, and general nonlinear programming.[1]

Multi objective minimizes

This group of solvers attempts to either minimize the maximum value of a set of functions, or to find a location where a collection of functions is below some prespecified values.[1]

Equation solvers

This group of solvers attempts to find a solution to a scalar- or vector-valued nonlinear equation f(x) = 0 near a starting point x0. Equation-solving can be considered a form of optimization because it is equivalent to finding the minimum norm of f(x) near x0.

Least-Squares (curve-fitting) solvers

This group of solvers attempts to minimize a sum of squares. This type of problem frequently arises in fitting a model to data. The solvers address problems of finding nonnegative solutions, bounded or linearly constrained solutions, and fitting parameterized nonlinear models to data.[1]

Unconstrained minimization is the problem of finding a vector x that is a local minimum to a scalar function f(x):

Min f(x)


The term unconstrained means that no restriction is placed on the range of x.

4.4-Large Scale (fminunc) Algorithm

Trust-Region Methods for Nonlinear Minimization [1]

Many of the methods used in Optimization Toolbox solvers are based on trust regions, a simple yet powerful concept in optimization. [1]

the unconstrained minimization problem, minimize f(x), where the function takes vector arguments and returns scalars. Suppose a point x in n-space and you want to improve, i.e., move to a point with a lower function value. The basic idea is to approximate f with a simpler function q , which reasonably reflects the behavior of function f in a neighborhood N around the point x. This neighborhood is the trust region. A trial step s is computed by minimizing (or approximately minimizing) over N.

Min {q(s), s ϵ N}(1)


The current point is updated to be x + s if f(x + s) < f(x); otherwise, the current point remains unchanged and N, the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach to minimizing f(x) are how to choose and compute the approximation q (defined at the current point x), how to choose and modify the trust region N, and how accurately to solve the trust-region sub problem.

In the standard trust-region method ([48]), the quadratic approximation q is defined by the first two terms of the Taylor approximation to F at x; the neighborhood N is usually spherical or ellipsoidal in shape. Mathematically the trust-region sub problem is typically stated

Min{1/2 s^T Hs+s^Tg such that ||D||≤ ∆}


where g is the gradient of f at the current point x, H is the Hessian matrix (the symmetric matrix of second derivatives), D is a diagonal scaling matrix, Δ is a positive scalar, and || || is the 2-norm.

which provide an accurate solution to Equation 4-3. However, they require time proportional to several factorizations of H. Therefore, for large-scale problems several approximation and heuristic strategies is needed, based on Equation 4-3, The approximation approach followed is to restrict the trust-region sub problem to a two-dimensional subspace S ([9] and [2]). Once the subspace S has been computed, the work to solve Equation 6-3 is trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only two-dimensional). The dominant work has now shifted to the determination of the subspace. [1]

The two-dimensional subspace S is determined with the aid of a preconditioned conjugate gradient process . The solver defines S as the linear space spanned by s1 and s2, where s1 is in the direction of the gradient g, and s2 is either an approximate Newton direction, i.e., a solution to

H .s2 = -g (1)


or a direction of negative curvature,

S2^T .H . s2 < 0


The philosophy behind this choice of S is to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).

A step of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region sub problem.

Solve Equation 6-3 to determine the trial step s.

If f(x + s) < f(x), then x = x + s.

Adjust Δ. [1]

These four steps are repeated until convergence. The trust-region dimension Δ is adjusted according to standard rules. In particular, it is decreased if the trial step is not accepted, i.e., f(x + s) ≥ f(x). [6] , [9].

4.5-Preconditioned Conjugate Gradient Method

A popular way to solve large symmetric positive definite systems of linear equations( Hp = -g ) is the method of Preconditioned Conjugate Gradients (PCG). (1)This iterative approach requires the ability to calculate matrix-vector products of the form( H·v) where v is an arbitrary vector. The symmetric positive definite matrix M is a preconditioner for H. That is, M = C^2, where C^-1HC^-1 is a well-conditioned matrix or a matrix with clustered Eigen values. (1)

In a minimization context, the Hessian matrix H is symmetric. However, H is guaranteed to be positive definite only in the neighborhood of a strong minimize. Algorithm PCG exits when a direction of negative (or zero) curvature is encountered, i.e., (d^THd ≤ 0). The PCG output direction, p, is either a direction of negative curvature or an approximate solution to the Newton system (Hp = -g). In either case p is used to help define the two-dimensional subspace used in the trust-region approach.

4.6-Medium Scale (fminunc) Algorithm

Basics of Unconstrained Optimization [1]

Although a wide spectrum of methods exists for unconstrained optimization, methods can be broadly categorized in terms of the derivative information that is, or is not, used. Search methods that use only function evaluations (e.g., the simplex search of Nelder and Mead [3] are most suitable for problems that are not smooth or have a number of discontinuities. Gradient methods are generally more efficient when the function to be minimized is continuous in its first derivative. Higher order methods, such as Newton's method, are only really suitable when the second-order information is readily and easily calculated, because calculation of second-order information, using numerical differentiation, is computationally expensive.

Gradient methods use information about the slope of the function to dictate a direction of search where the minimum is thought to lie. The simplest of these is the method of steepest descent in which a search is performed in a direction, -f(x), where f(x) is the gradient of the objective function.

Quasi-Newton Methods (1)

Of the methods that use gradient information, the most favored are the quasi-Newton methods. These methods build up curvature information at each iteration to formulate a quadratic model problem of the form




where the Hessian matrix, H, is a positive definite symmetric matrix, c is a constant vector, and b is a constant. The optimal solution for this problem occurs when the partial derivatives of x go to zero, i.e.,



The optimal solution point, x, can be written as



Newton-type methods as opposed to quasi-Newton methods calculate H directly and proceed in a direction of descent to locate the minimum after a number of iterations. Calculating H numerically involves a large amount of computation. Quasi-Newton methods avoid this by using the observed behavior of f(x) and f(x) to build up curvature information to make an approximation to H using an appropriate updating technique. (1)

A large number of Hessian updating methods have been developed. However, the formula of Broyden [3], Fletcher [12], Goldfarb [20], and Shanno [37] (BFGS) is thought to be the most effective for use in a general purpose method.

The formula given by BFGS is






As a starting point, H0 can be set to any symmetric positive definite matrix. To avoid the inversion of the Hessian H, an updating method that avoids the direct inversion of H by using a formula that makes an approximation of the inverse Hessian H-1 at each update. A well-known procedure is the DFP formula of Davidon [7], Fletcher, and Powell [14]. This uses the same formula as the BFGS method (Equation 6-10) except that qk is substituted for sk. [1]

The gradient information is either supplied through analytically calculated gradients, or derived by partial derivatives using a numerical differentiation method via finite differences. This involves perturbing each of the design variables, x, in turn and calculating the rate of change in the objective function.

At each major iteration, k, a line search is performed in the direction

d=-Hk^-1.f(xk) (1)


Line Search

Line search is a search method that is used as part of a larger optimization algorithm. At each step of the main algorithm, the line-search method searches along the line containing the current point, xk, parallel to the search direction, which is a vector determined by the main algorithm. That is, the method finds the next iterate xk+1 of the form (1)

Xk+1=xk+ dk


where (xk) denotes the current iterate,( dk) is the search direction, and (α) is a scalar step length parameter.

The line search method attempts to decrease the objective function along the line xk + αdk by repeatedly minimizing polynomial interpolation models of the objective function. The line search procedure has two main steps:

The bracketing phase determines the range of points on the line (xk+1=xk+ dk) to be searched. The bracket corresponds to an interval specifying the range of values of α.

The sectioning step divides the bracket into subintervals, on which the minimum of the objective function is approximated by polynomial interpolation.

The resulting step length α satisfies the Wolfe conditions: [1]

f(xk+ αdk)f(xk)+c1 αfk^Tdk




where c1 and c2 are constants with 0 < c1 < c2 < 1.

The first condition (Equation 4-13) requires that αk sufficiently decreases the objective function. The second condition (Equation 4-14) ensures that the step length is not too small. Points that satisfy both conditions (Equation 4-13 and Equation 4-14) are called acceptable points.

Many of the optimization functions determine the direction of search by updating the Hessian matrix at each iteration, using the BFGS method (Equation 4-10). The function( fminunc) also provides an option to use the DFP method given in Quasi-Newton Methods. The Hessian, H, is always maintained to be positive definite so that the direction of search, d, is always in a descent direction. This means that for some arbitrarily small step α in the direction d, the objective function decreases in magnitude. achieving positive definiteness of H by ensuring that H is initialized to be positive definite and thereafter (qk^Tsk) (from Equation 4-15) is always positive. The term (qk^Tsk) is a product of the line search step length parameter αk and a combination of the search direction d with past and present gradient evaluations,

Qk^Tsk= αk(f(xk+1)^Td- f(xk)^Td)


achieving the condition that (Qk^Tsk) is positive by performing a sufficiently accurate line search. This is because the search direction, d, is a descent direction, so that αk and the negative gradient -f(xk)^Td are always positive. Thus, the possible negative term -f(xk+1)^Td can be made as small in magnitude as required by increasing the accuracy of the line search.

fminsearch Algorithm [1]

fminsearch uses the Nelder-Mead simplex algorithm as described in Lagarias et al. [57]. This algorithm uses a simplex of n + 1 points for n-dimensional vectors x. The algorithm first makes a simplex around the initial guess x0 by adding 5% of each component x0(i) to x0, and using these n vectors as elements of the simplex in addition to x0. (It uses 0.00025 as component i if x0(i) = 0.) Then, the algorithm modifies the simplex repeatedly according to the following procedure.


x(i) denote the list of points in the current simplex, i = 1,...,n+1.

2- order the points in the simplex from lowest function value f(x(1)) to highest f(x(n+1)). At each step in the iteration, the algorithm discards the current worst point x(n+1), and accepts another point into the simplex. [Or, in the case of step 7 below, it changes all n points with values above f(x(1))].

3-Generate the reflected point

r = 2m - x(n+1),


m = Σx(i)/n, i = 1...n,

and calculate f(r).

4-If f(x(1)) ≤ f(r) < f(x(n)), accept r and terminate this iteration. Reflect(1)

If f(r) < f(x(1)), calculate the expansion point s

s = m + 2(m - x(n+1)),

and calculate f(s).

If f(s) < f(r), accept s and terminate the iteration. Expand

Otherwise, accept r and terminate the iteration. Reflect

If f(r) ≥ f(x(n)), perform a contraction between m and the better of x(n+1) and r:

If f(r) < f(x(n+1)) (i.e., r is better than x(n+1)), calculate

c = m + (r - m)/2

and calculate f(c). If f(c) < f(r), accept c and terminate the iteration. Contract outside Otherwise, continue with Step 7.

If f(r) ≥ f(x(n+1)), calculate

cc = m + (x(n+1) - m)/2

and calculate f(cc). If f(cc) < f(x(n+1)), accept cc and terminate the iteration. Contract inside Otherwise, continue with Step 7.

Calculate the n points

v(i) = x(1) + (x(i) - x(1))/2

and calculate f(v(i)), i = 2,...,n+1. The simplex at the next iteration is x(1), v(2),...,v(n+1).

The following figure shows the points that fminsearch might calculate in the procedure, along with each possible new simplex. The original simplex has a bold outline. The iterations proceed until they meet a stopping criterion.

the point that fminsearch might calculate in the procedure.

4-7- description of fminunc

This function used in large scale minimization nonlinear it is important to describe it


Find minimum of unconstrained multivariable function


Finds the minimum of a problem specified by

Min f(x)


where x is a vector and f(x) is a function that returns a scalar.


x = fminunc(fun,x0)

x = fminunc(fun,x0,options)

x = fminunc(problem)

[x,fval] = fminunc(...)

[x,fval,exitflag] = fminunc(...)

[x,fval,exitflag,output] = fminunc(...)

[x,fval,exitflag,output,grad] = fminunc(...)

[x,fval,exitflag,output,grad,hessian] = fminunc(...)


fminunc attempts to find a minimum of a scalar function of several variables, starting at an initial estimate. This is generally referred to as unconstrained nonlinear optimization.

x = fminunc(fun,x0) starts at the point x0 and attempts to find a local minimum x of the function described in fun. x0 can be a scalar, vector, or matrix.

x = fminunc(fun,x0,options) minimizes with the optimization options specified in the structure options. Use optimset to set these options.

x = fminunc(problem) finds the minimum for problem, where problem is a structure described in Input Arguments.

[x,fval] = fminunc(...) returns in( fval )the value of the objective function (fun) at the solution x.

[x,fval,exitflag] = fminunc(...) returns a value (exitflag) that describes the exit condition.

[x,fval,exitflag,output] = fminunc(...) returns a structure output that contains information about the optimization.

[x,fval,exitflag,output,grad] = fminunc(...) returns in (grad) the value of the gradient of (fun) at the solution x.

[x,fval,exitflag,output,grad,hessian] = fminunc(...) returns in (hessian) the value of the Hessian of the objective function fun at the solution x.

Input Arguments

Function Arguments contains general descriptions of arguments passed into fminunc. This section provides function-specific details for fun, options, and problem:


The function to be minimized. fun is a function that accepts a vector x and returns a scalar f, the objective function evaluated at x. The function fun can be specified as a function handle for a file

x = fminunc(@myfun,x0)

where myfun is a MATLAB function such as

function f = myfun(x)

f = ... % Compute function value at x

fun can also be a function handle for an anonymous function.

x = fminunc(@(x)norm(x)^2,x0);

If the gradient of fun can also be computed and the GradObj option is 'on', as set by

options = optimset('GradObj','on')

then the function fun must return, in the second output argument, the gradient value g, a vector, at x. The gradient is the partial derivatives ∂f/∂xi of f at the point x. That is, the ith component of g is the partial derivative of f with respect to the ith component of x.

If the Hessian matrix can also be computed and the Hessian option is 'on', i.e., options = optimset('Hessian','on'), then the function fun must return the Hessian value H, a symmetric matrix, at x in a third output argument. The Hessian matrix is the second partial derivatives matrix of f at the point x. That is, the (i,j)th component of H is the second partial derivative of f with respect to xi and xj, ∂2f/∂xi∂xj. The Hessian is by definition a symmetric matrix.

Writing Objective Functions explains how to "conditionalize" the gradients and Hessians for use in solvers that do not accept them. Passing Extra Parameters explains how to parameterize fun, if necessary.





Options provides the function-specific details for the options values.




Objective function



Initial point for x






Options structure created with optim set


Output Arguments

Function Arguments contains general descriptions of arguments returned by fminunc. This section provides function-specific details for exitflag and output:


Integer identifying the reason the algorithm terminated. The following lists the values of exit flag and the corresponding reasons the algorithm terminated.



Magnitude of gradient smaller than the TolFun tolerance.



Change in x was smaller than the TolX tolerance.



Change in the objective function value was less than the TolFun tolerance.



Predicted decrease in the objective function was less than the TolFun tolerance.



Number of iterations exceeded options. MaxIter or number of function evaluations exceeded options. FunEvals.



Algorithm was terminated by the output function.


Gradient at x



Hessian at x



Structure containing information about the optimization. The fields of the structure are



Number of iterations taken



Number of function evaluations



Measure of first-order optimality



Optimization algorithm used



Total number of PCG iterations (large-scale algorithm only)



Final displacement in x (medium-scale algorithm only)



Exit message


fminunc computes the output argument hessian as follows:

When using the medium-scale algorithm, the function computes a finite-difference approximation to the Hessian at x using

The gradient grad if you supply it

The objective function fun if you do not supply the gradient

When using the large-scale algorithm, the function uses

options. Hessian, if you supply it, to compute the Hessian at x

A finite-difference approximation to the Hessian at x, if you supply only the gradient