Lecture notes from CMU 16–745 Optimal Control (Spring 2023).
Three Ways to Solve LQR
- LQR formulation:
- LQR variants:
- finite horizon vs infinite horizon
- time-varying dynamics vs time-invariant dynamics
- deterministic vs stochastic - Shooting method for discrete-time deterministic optimal control:
- Initialize the action sequence with random guesses.
- Do a forward rollout to get the state sequence.
- Make a backward pass to propagate the Lagrange multipliers and the action sequence.
- Repeat the process until convergence.
- Quadratic programming (QP):
- Without inequality constraints, this would be an overkill for LQR.
- This just gives a simple solution, instead of a feedback policy.
- But this is the general way to go for convex MPC.
- Dynamic programming (DP):
- The action-value function should be quadratic and the cost-to-go be linear.
- Solve the backward Riccati recurrence for a feedback policy and then perform a forward rollout from the initial state.
- The feedback policy also works for the stochastic form with independent process noise.
- Without a simple cost-to-go approximation, this would be intractable for high-dimensional states.
- This would also be broken in the presence of inequality constraints.
- Condition for controllability:
- The rank of the controllability matrix equals the dimension of the state space.
- Discretization of continuous-time linear dynamics under zero-order hold control inputs:
Related Concepts in Later Lectures¹
- Differential dynamic programming (DDP):
- This is a variant of the shooting method for general unconstrained trajectory optimization: forward pass for trajectory rollout + backward pass for action update.
- Use the 2nd-order Taylor expansion to approximate the cost-to-go function locally.
- The control policy includes both a feedback term and a feed forward term since Δx can be based off of a non-equilibrium point.
- As in Newton’s method, regulation needs to be added (to make sure matrices are positive definite) before calculating K and d in the backward pass.
- To guarantee a decent descent in the cost function, the forward pass should use a line search where all the time steps share the same Armijo factor at a time.
- iLQR, which iteratively linearizes the dynamics and quadraticizes the cost function, is a Gauss-Newton version of DDP.
- Direct collocation:
- This is an alternative to the shooting method such that states and actions are optimized together without forward rollout.
- Use cubic splines to fit for the state evolution over time between consecutive knot points.
- Enforce dynamics as well as inequality constraints on the middle point of each spline segment.
- Implicit methods, such as backward Euler, tend to have better stability and also save overall calls to dynamics compared with explicit methods.
- Sequential quadratic programming (SQP):
- At each iteration, approximate the non-convex dual problem as a QP by taking the 2nd-order Taylor expansion of the Lagrangian and by linearizing the constraints.
- The QP gives a direction along which a line search is performed to move primal and dual variables.