The purpose of this post is to introduce you to some of the basics of control theory and to introduce the Linear-Quadratic Regulator, an extremely good hammer for solving stabilization problems.

To start with, what do we mean by a control problem? We mean that we have some system with dynamics described by an equation of the form

$\dot{x} = Ax,$

where $x$ is the state of the system and $A$ is some matrix (which itself is allowed to depend on $x$). For example, we could have an object that is constrained to move in a line along a frictionless surface. In this case, the system dynamics would be

$\left[ \begin{array}{c} \dot{q} \ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \ 0 & 0 \end{array} \right]\left[ \begin{array}{c} q \ \dot{q} \end{array} \right]. $

Here $q$ represents the position of the object, and $\dot{q}$ represents the velocity (which is a relevant component of the state, since we need it to fully determine the future behaviour of the system). If there was drag, then we could instead have the following equation of motion:

$\left[ \begin{array}{c} \dot{q} \ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \ 0 & -b \end{array} \right]\left[ \begin{array}{c} q \ \dot{q} \end{array} \right], $

where $b$ is the coefficient of drag.

If you think a bit about the form of these equations, you will realize that it is both redundant and not fully general. The form is redundant because $A$ can be an arbitrary function of $x$, yet it also acts on $x$ as an argument, so the equation $\ddot{q} = q\dot{q}$, for example, could be written as

$\left[ \begin{array}{c} \dot{q} \ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \ \alpha \dot{q} & (1-\alpha) q \end{array} \right] \left[ \begin{array}{c} q \ \dot{q} \end{array} \right]$

for any choice of $\alpha$. On the other hand, this form is also not fully general, since $x = 0$ will always be a fixed point of the system. (We could in principle fix this by making $\dot{x}$ affine, rather than linear, in $x$, but for now we'll use the form given here.)

So, if this representation doesn't uniquely describe most systems, and can't describe other systems, why do we use it? The answer is that, for most systems arising in classical mechanics, the equations naturally take on this form (I think there is a deeper reason for this coming from Lagrangian mechanics, but I don't yet understand it).

Another thing you might notice is that in both of the examples above, $x$ was of the form $\left[ \begin{array}{c} q \ \dot{q} \end{array} \right]$. This is another common phenomenon (although $q$ and $\dot{q}$ may be vectors instead of scalars in general), owing to the fact that Newtonian mechanics produces second-order systems, and so we care about both the position and velocity of the system.

So, now we have a mathematical formulation, as well as some notation, for what we mean by the equations of motion of a system. We still haven't gotten to what we mean by control. What we mean is that we assume that, in addition to the system state $x$, we have a control input $u$ (usually we can choose $u$ independently from $x$), such that the actual equations of motion satisfy

$\dot{x} = Ax+Bu,$

where again, $A$ and $B$ can both depend on $x$. What this really means physically is that, for any configuration of the system, we can choose a control input $u$, and $u$ will affect the instantaneous change in state in a linear manner. We normally call each of the entries of $u$ a torque.

The assumption of linearity might seem strong, but it is again true for most systems, in the sense that a linear increase in a given torque will induce a linear response in the kinematics of the system. But note that this is only true once we talk about mechanical torques. If we think of a control input as an electrical signal, then the system will usually respond non-linearly with respect to the signal. This is simply because the actuator itself provides a force that is non-linear with its electrical input.

We can deal with this either by saying that we only care about a local model, and the actuator response is locally linear to its input; or, we can say that the problem of controlling the actuator itself is a disjoint problem that we will let someone worry about. In either case, I will shamelessly use the assumption that the system response is linear in the control input.

So, now we have a general form for equations of motion with a control input. The general goal of a control problem is to pick a function $f(x,t)$ such that if we let $u = f(x,t)$ then the trajectory $X(t)$ induced by the equation $\dot{x} = Ax+Bf(x,t)$ minimizes some objective function $J(X,f)$. Sometimes our goals are more modest and we really just want to get to some final state, in which case we can make $J$ just be a function of the final state that assigns a score based on how close we end up to the target state. We might also have hard constraints on $u$ (because our actuators can only produce a finite amount of torque), in which case we can make $J$ assign an infinite penalty to any $f$ that violates these constraints.

As an examples, let's return to our first example of an object moving in a straight line. This time we will say that $\left[ \begin{array}{c} \dot{q} \ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \ 0 & 0 \end{array} \right] \left[ \begin{array}{c} q \ \dot{q} \end{array} \right]+\left[ \begin{array}{c} 0 \ 1 \end{array} \right]u$, with the constraint that $\|u\| \leq A$. We want to get to $x = \left[ \begin{array}{c} 0 \ 0 \end{array} \right]$ as quickly as possible, meaning we want to get to $q = 0$ and then stay there. We could have $J(X,f)$ just be the amount of time it takes to get to the desired endpoint, with a cost of infinity on any $f$ that violates the torque limits. However, this is a bad idea, for two reasons.

The first reason is that, numerically, you will never really end up at exactly $\left[ \begin{array}{c} 0 \ 0 \end{array} \right]$, just very close to it. So if we try to use this function on a computer, unless we are particularly clever we will assign a cost of $\infty$ to every single control policy.

However, we could instead have $J(X,f)$ be the amount of time it takes to get close to the desired endpoint. I personally still think this is a bad idea, and this brings me to my second reason. Once you come up with an objective function, you need to somehow come up with a controller (that is, a choice of $f$) that minimizes that objective function, or at the very least performs reasonably well as measured by the objective function. You could do this by being clever and constructing such a controller by hand, but in many cases you would much rather have a computer find the optimal controller. If you are going to have a computer search for a good controller, you want to make the search problem as easy as possible, or at least reasonable. This means that, if we think of $J$ as a function on the space of control policies, we would like to make the problem of optimizing $J$ tractable. I don't know how to make this precise, but there are a few properties we would like $J$ to satisfy --- there aren't too many local minima, and the minima aren't approached too steeply (meaning that there is a reasonable large neighbourhood of small values around each local minimum). If we choose an objective function that assigns a value of $\infty$ to almost everything, then we will end up spending most of our time wading through a sea of infinities without any direction (because all directions will just yield more values of $\infty$). So a very strict objective function will be very hard to optimize. Ideally, we would like a different choice of $J$ that has its minimum at the same location but that decreases gradually to that minimum, so that we can solve the problem using gradient descent or some similar method.

In practice, we might have to settle for an objective function that only is trying to minimize the same thing qualitatively, rather than in any precise manner. For example, instead of the choice of $J$ discussed above for the object moving in a straight line, we could choose

$J(X,f) = \int_{0}^{T} \|q(t)\|^2 dt,$

where $T$ is some arbitrary final time. In this form, we are trying to minimize the time-integral of some function of the deviation of $q$ from $0$. With a little bit of work, we can deduce that, for large enough $T$, the optimal controller is a bang-bang controller that accelerates towards $0$ at the greatest rate possible, until accelerating any more would cause the object to overshoot $q = 0$, at which point the controller should decelerate at the greatest rate possible (there are some additional cases for when the object will overshoot the origin no matter what, but this is the basic idea).

This brings us to my original intention in making this post, which is LQR (linear-quadratic regulator) control. In this case, we assume that $A$ and $B$ are both constant and that our cost function is of the form

$J(X,f) = \int_{0}^{\infty} X(t)^{T}QX(t) + f(X(t),t)^{T}Rf(X(t),t) dt,$

where the $T$ means transpose and $Q$ and $R$ are both positive definite matrices. In other words, we assume that our goal is to get to $x = 0$, and we penalize both our distance from $x = 0$ and the amount of torque we apply at each point in time. If we have a cost function of this form, then we can actually solve analytically for the optimal control policy $f$. The solution involves solving the Hamilton-Bellman-Jacobi equations, and I won't go into the details, but when the smoke clears we end up with a linear feedback policy $u = -Kx$, where $K = R^{-1}B^{T}P$, and $P$ is given by the solution to the algebraic Riccati equation

$A^TP+PA-PBR^{-1}B^TP+Q=0.$

What's even better is that MATLAB has a built-in function called lqr that will set up and solve the Riccati equation automatically.

You might have noticed that we had to make the assumption that both $A$ and $B$ were constant, which is a fairly strong assumption, as it implies that we have a LTI (linear time-invariant) system. So what is LQR control actually good for? The answer is stabilization. If we want to design a controller that will stabilize a system about a point, we can shift coordinates so that the point is at the origin, then take a linear approximation about the origin. As long as we have a moderately accurate linear model for the system about that point, the LQR controller will successfully stabilize the system to that point within some basin of attraction. More technically, the LQR controller will make the system locally asymptotically stable, and the cost function $J$ for the linear system will be a valid local Lyapunov function.

Really, the best reason to make use of LQR controllers is that they are a solution to stabilization problems that work out of the box. Many controllers that work in theory will actually require a ton of tuning in practice; this isn't the case for an LQR controller. As long as you can identify a linear system about the desired stabilization point, even if your identification isn't perfect, you will end up with a pretty good controller.

I was thinking of also going into techniques for linear system identification, but I think I'll save that for a future post. The short answer is that you find a least-squares fit of the data you collect. I'll also go over how this all applies to the underwater cart-pole in a future post.