What is Linear Programming?
Or what do they mean by âsolve for xâ?
We all remember our introduction to basic algebra in grade or middle school. Our teachers would refer to them as âword problemsâ so we wouldnât panic at the ambiguity of not simply applying a set of rehearsed instructions like we had done up to this point with long additions. The first questions only involved a single unknown (often labeled x) that we had to âsolve forâ. For example, if a teacher brings 33 apples to class and gives each student 3, how many students are in the class. Then we moved on to more advanced problems involving multiple unknowns. We quickly learned that 2 unknowns require as many âfactsâ to yield a unique solution. And soon enough, without even realizing it, we started studying linear algebra.
Did you ever wonder what happens when we break the rule that N
unknowns require N
independent equations for a unique solution? There are only two cases to consider:
- more equations than unknowns (overdetermined system)
- fewer equations than unknowns (underdetermined system)
An overdetermined system does not have a solution. Itâs common practive in that case to pick the least bad approximation. This involves measuring how much our hypothetical solution âmissesâ each equation then searching for the solution that âmissesâ the least. Specialists call this approach minimizing the Lâ norm and the associated procedure Ordinary Least Squares. This technique is very important when trying to extract trends from data. However, OLS is not the focus of this current blog. We might return to it in a later post on statistical modeling.
This post will focus on underdetermined systems. Underdetermined systems have either no solution or infinitely many solutions. These solutions form a convex set in ââż. That geometric interpretation is helpful. Let us take a moment to understand how it comes about. First, we remember that \(a x + b y + c z + d = 0\) (where \(a\), \(b\), \(c\), and \(d\) represent known coefficients whereas \(x\), \(y\), and \(z\) represent unknowns) defines the two-dimensional plane in three dimensions which is orthogonal to the vector \((a, b, c)\). As a consequence, we can interpret the solution set of an underdetermined system of equations as the intersection of these n-dimensional planes. We struggle to interpret geometric objects in more than 2 or 3 dimensions. Fortunately, we can project them to lower dimensions. Those projections are equivalent to dropping one of the dimensions (say \(z\)) and turning the equality into an inequality. The same concept lets us go the other way (turning inequalities into equalities) by introducing slack variables.
Recapping, having more unknowns than equations leads to infinitely many solutions. We often choose one preferred solution out of that infinite feasible set. This can be achieved by defining an objective function over the set of solutions. In other words, optimization (which is what we call choosing the best objective value out of the feasible set) is complementary to regression (where we find the least bad approximate solution). The next obstacle is that linear optimization solvers expect the problem in matrix form. Take for example the wheat and barley example in the previous link in algebraic form:
\[\begin{eqnarray} \mathrm{maximize}\ 3x_1+4x_2 \\ \mathrm{subject\ to}\ \begin{array}{rcc} x_1 + x_2 &\le& 10 \\ 3x_1 + 6x_2 &\le& 48 \\ 4x_1 + 2x_2 &\le& 32 \\ \end{array} \end{eqnarray}\]The corresponding matrix form reads
\[\begin{eqnarray} \mathrm{maximize}\ (3\ 4)^T x \\ \mathrm{subject\ to}\ \left( \begin{array}{cc} 1 & 1 \\ 3 & 6 \\ 4 & 2 \end{array} \right) x \le \left( \begin{array}{cc} 10 \\ 48 \\ 32 \end{array} \right) \end{eqnarray}\]Typography highlights the equivalence but how do we make computer âseeâ it? First, we need objects that let us represent symbolic expressions. Itâs not hard to roll those by hand if weâre only interested in this specific problem but thatâs hardly the python way. Much easier to install the standard python package for symbolic manipulations: sympy. Next, we are going to write a simple function that lets us write our sample problem as
from sympy.abc import x, y
maximum = maximize(
4 * x + 3 * y,
x + y <= 10,
3 * x + 6 * y <= 48,
4 * x + 2 * y <= 32,
)
Our function is very similar to
sympy.linear_eq_to_matrix
except it also accepts an objective function and doesnât ask you to provide symbols
but
infers them for you.
We start by collecting matrix coefficients:
from operator import attrgetter, methodcaller
lhs = map(attrgetter("lhs"), constraints) # maximize "*args"
coefficients = list(map(methodcaller("as_coefficients_dict"), lhs))
This representation as list of dictionaries remind us of a sparse matrix in
Compressed Sparse Row
format. CSR is flat thanks to an extra array (indptr
) which tells us where rows begin and end. Thatâs easy:
from itertools import accumulate
indptr = np.fromiter(
accumulate(map(len, coefficients), initial=0),
dtype=int,
count=len(constraints) + 1
)
The data
vector is equally trivial, we just need to flatten the .values()
of coefficients
:
from itertools import chain
data = np.fromiter(
chain.from_iterable(map(methodcaller("values"), coefficients)),
dtype=float,
count=len(indices),
)
(Please forgive me if youâre finding operator and itertools hard to grok. List comprehensions would have been equally suitable to express this logic)
indices
is a bit trickier. Itâs precisely what
np.unique
means by
return_inverse
. Unfortunately, np.unique
requires comparable types because it invokes
np.argsort
under the
hood. Fortunately, pd.factorize
does precisely what we need (and only requires hashable types):
from scipy import sparse
indices, variables = pd.factorize(list(chain.from_iterable(coefficients)))
A = sparse.csr_array(
(data, indices, indptr),
shape=(len(constraints), len(variables)),
)
b = np.fromiter(
map(methodcaller("evalf"), rhs), dtype=float, count=len(constraints)
)
c = np.vectorize(objective.coeff, otypes=[float])(variables)
pd.factorize
also returned the unique variables which we can pass to
sympy.Expr.coeff
to
define the c
vector.
And there you have it, just learning the absolute minimum about sympy
âs rich API lets us
convert a linear optimization problem to canonical form.