Simplex Method

See Also: Constrained Optimization Linear Programming

Introduction

The simplex method generates a sequence of feasible iterates by repeatedly moving from one vertex of the feasible set to an adjacent vertex with a lower value of the objective function \(c^T x\). When it is not possible to find an adjoining vertex with a lower value of \(c^T x\), the current vertex must be optimal, and termination occurs.

After its development by Dantzig in the 1940s, the simplex method was unrivaled until the late 1980s for its utility in solving linear programming problems. Although never observed on practical problems, the poor worst-case behavior of the algorithm — the number of iterations may be exponential in the number of unknowns — led to an ongoing search for algorithms with better computational complexity. This search continued until the late 1970s when the first polynomial-time algorithm (Khachiyan’s ellipsoid method) was developed. Most interior-point methods also have polynomial complexity.

Algorithm Steps

Algebraically speaking, the simplex method is based on the observation that at least \((n-m)\) of the components of \(x\) are zero if \(x\) is a vertex of the feasible set. Accordingly, the components of \(x\) can be partitioned at each vertex into a set of \(m\) basic variables (all nonnegative) and a set of \(n-m\) nonbasic variables (all zero). If we gather the basic variables into a subvector, \(x_B \in R^m\), and the nonbasic variables into another subvector, \(x_N \in R^{n-m}\), we can partition the columns of \(A\) as \([B|N]\), where \(B\) contains the \(m\) columns that correspond to \(x_B\). (Note that \(B\) is a square matrix.)

At each iteration of the simplex method, a basic variable (a component of \(x_B\)) is reclassified as nonbasic and vice versa. In other words, \(x_B\) and \(x_N\) exchange a component. Geometrically, this swapping process corresponds to a move from one vertex of the feasible set to an adjacent vertex. We therefore need to choose which component of \(x_N\) should enter \(x_B\) (that is, be allowed to move off its zero bound) and which component of \(x_B\) should enter \(x_N\) (that is, be driven to zero). In fact, we need make only the first of these choices, since the second choice is implied by the feasibility constraints \(Ax = b\) and \(x\geq 0\). In selecting the entering component, we note that \(c^T x\) can be expressed as a function of \(x_N\) alone. We can express \(x_B\) in terms of \(x_N\) by noting that \(Ax = b\) implies that

\[x_B = B^{-1} (b – N x_N).\]

Hence, partitioning \(c\) into \(c_B\) and \(c_N\) in the obvious way, we have
\[c^T = c_B^T x_B + c_N^T x_N = c_B^T B^{-1} b + \left( c_N – N^T B^{-T}c_B \right)^T x_N\]

The vector
\[d_N = c_N – N^T B^{-T} c_B\] is the ”reduced-cost vector”. If all components of \(x_B\) are strictly positive and some component (say, the \(i^{th}\) component) of \(d_N\) is negative, we can decrease the value of \(c^T x \) by allowing component \(i\) of \(x_N\) to become positive while adjusting \(x_B\) to maintain feasibility. Unless there exist feasible points \(x\) that make \(c^T x\) arbitrarily negative, the requirement \(x_B \geq 0\) imposes an upper bound on \(x_{N_i}\), component \(i\) of \(x_N\).

In principle, we can choose any component \(x_{N_i}\) with \(d_{N_i} < 0\) as an entering variable. If there are no negative entries in \(d_N\), the current point \(x\) is optimal. If there is more than one, we would ideally pick the component that will lead to the largest reduction in \(c^T x\) on the current iteration. Heuristics for making this selection are discussed below.

It follows from \(Ax = b\) and the fact that the remaining elements of \(x_N\) are held at zero that

\[x_B = B^{-1} (b – N_i x_{N_i})\]

where \(N_i\) denotes the column of \(N\) that corresponds to \(x_{N_i}\). We choose the new value of \(x_{N_i}\) to be the largest value that maintains \(x_B \geq 0\). To obtain \(x_{N_i}\) explicitly, we can rearrange the previous equation to obtain
\[x_{N_i} = \min \left\{ \frac{[B^{-1} b]_j}{[B^{-1} N_i]_j} \; : \; (B^{-1}N_i)_j > 0\right\}.\]

The index \(j\) that achieves the minimum in this formula indicates the basic variable \(x_{B_j}\) that is to become nonbasic. If more than one such component achieves the minimum simultaneously, the one with the largest value of \((B^{-1}N_i)_j\) is usually selected.

Matrix Operations

Most of the computational cost in simplex algorithms arises from the need to compute the vectors \(B^{-T} c_B\) and \(B^{-1} N_i\) and the need to keep track of the changes in \(B\) and \(B^{-1}\) resulting from the changes in the basis at each iteration. We could simply recompute and store \(B^{-1}\) explicitly after each step. This strategy is undesirable for two reasons. First, the matrix \(B^{-1}\) is usually dense even though the original \(B\) is sparse. Hence, explicit calculation of \(B^{-1}\) requires prohibitive amounts of computing time and storage. Second, since \(B\) changes only slightly from one iteration to the next, we should be able to update information about \(B\) and \(B^{-1}\) rather than recompute it anew at every step.

A technique that is used in many commercial codes is to store an \(L U\) factorization of \(B\). That is, to keep track of matrices \(P, Q, L, U\) such that
\[B = P L U Q \,\] where \(P, Q\) are permutation matrices (identity matrices whose rows have been reordered), while \(L\), \(U\) are lower- and upper- triangular matrices, respectively. Since \(P^T P = I\) and \(Q^T Q = I\), we can calculate \(z = B^{-T} c_B\) by performing the following sequence of operations:
\[U^T z_1 = Q c_B, \quad L^Tz_2 = z_1, \quad z = P z_2\]

The first two operations are back- and forward-substitutions with triangular matrices, which can be performed efficiently, while the final operation is a simple rearrangement of the elements of \([z_2]\). Calculation of \(B^{-1} N_i\) proceeds similarly. The permutation matrices \(P\), \(Q\) are chosen so that the factorization is reasonably stable and the factors \(L\), \(U\) are not too much denser than the original matrix \(A\).

When \(B\) is changed by a single column, the factorization can be updated by applying a number of elementary transformations; that is, additions of multiples of one row of the matrix to another row. Rather than applying these transformations explicitly to the existing factors, they are usually stored in a compact form. When the storage occupied by the elementary transformations becomes excessive, they are discarded, and the current basis matrix \(B\) is refactored from scratch.

Pricing Strategies

We return to strategies for choosing the component \(i\) of \(x_N\) to enter the basis, an operation that is known as ”pricing” in linear programming parlance. The simplest strategy is to choose \(i\) to correspond to the most negative component of the reduced-cost vector \(d_N\). This approach, known as Dantzig’s rule, gives the fastest decrease in the objective function per unit increase in the entering variable. However, change in the entering variable often does not give a good indication of how far we actually have to move; it could be that a small perturbation in the entering component corresponds to a huge step along the corresponding edge of the feasible polytope, so we actually have to move a long way to get the benefits promised by Dantzig’s rule. This observation is the motivation behind the ”steepest-edge” strategy, in which we choose the edge along which the objective function decreases most rapidly ”per unit of distance along the edge”. The extra computation needed to identify the steepest edge is often more than offset by a reduction in the number of iterations, and this strategy is an option in many LP solvers.

When the linear program is too large for the data to be stored in core memory, the cost of computing the complete reduced-cost vector \(d_N\) at each iteration may require too much traffic with secondary storage, and it may take too long. In this situation, a ”partial pricing” strategy may be appropriate. This strategy finds only a subvector of \(d_N\) and chooses the entering variable from those components that are actually computed. Of course, the subset of indices that defines the subvector of \(d_N\) should be changed frequently.

A problem with all of these strategies is that they do not predict the actual decrease in the objective function \(c^T x\) that will occur on this iteration. It may happen that we are able to move only a short distance along the chosen edge before encountering another vertex, so the reduction may be minimal. A ”multiple pricing” strategy selects a small group of columns with negative reduced costs and computes the actual reduction that would be achieved if any one of the corresponding variables entered the basis. This process is expensive, since it requires the calculation of \(B^{-1} N_i\) for each candidate \(i\). One of the candidates is chosen, and the remainder are retained as candidates for the next iteration, since the marginal cost of updating the column \(B^{-1} N_i\) to correspond to the new basis matrix is not too high. Of course, the candidate list must be refreshed frequently.

Simplex Method Tools

There are a number of interactive tools available that allow users to step through iterations of the simplex method.