Lecture notes from CMU 16–745 Optimal Control (Spring 2023).

## Three Ways to Solve LQR

- LQR formulation:

- LQR variants:

- finite horizon vs infinite horizon

- time-varying dynamics vs time-invariant dynamics

- deterministic vs stochastic - Shooting method for discrete-time deterministic optimal control:

- Initialize the action sequence with random guesses.

- Do a forward rollout to get the state sequence.

- Make a backward pass to propagate the Lagrange multipliers and the action sequence.

- Repeat the process until convergence.

- Quadratic programming (QP):

- Without inequality constraints, this would be an overkill for LQR.

- This just gives a simple solution, instead of a feedback policy.

- But this is the general way to go for convex MPC.

- Dynamic programming (DP):

- The action-value function should be quadratic and the cost-to-go be linear.

- Solve the backward Riccati recurrence for a feedback policy and then perform a forward rollout from the initial state.

- The feedback policy also works for the stochastic form with independent process noise.

- Without a simple cost-to-go approximation, this would be intractable for high-dimensional states.

- This would also be broken in the presence of inequality constraints.

- Condition for controllability:

- The rank of the controllability matrix equals the dimension of the state space.

- Discretization of continuous-time linear dynamics under zero-order hold control inputs:

## Related Concepts in Later Lectures¹

- Differential dynamic programming (DDP):

- This is a variant of the shooting method for general unconstrained trajectory optimization: forward pass for trajectory rollout + backward pass for action update.

- Use the 2nd-order Taylor expansion to approximate the cost-to-go function locally.

- The control policy includes both a feedback term and a feed forward term since*Δx*can be based off of a non-equilibrium point.

- As in Newton’s method, regulation needs to be added (to make sure matrices are positive definite) before calculating*K*and*d*in the backward pass.

- To guarantee a decent descent in the cost function, the forward pass should use a line search where all the time steps share the same Armijo factor at a time.

- iLQR, which iteratively linearizes the dynamics and quadraticizes the cost function, is a Gauss-Newton version of DDP.

- Direct collocation:

- This is an alternative to the shooting method such that states and actions are optimized together without forward rollout.

- Use cubic splines to fit for the state evolution over time between consecutive knot points.

- Enforce dynamics as well as inequality constraints on the middle point of each spline segment.

- Implicit methods, such as backward Euler, tend to have better stability and also save overall calls to dynamics compared with explicit methods.

- Sequential quadratic programming (SQP):

- At each iteration, approximate the non-convex dual problem as a QP by taking the 2nd-order Taylor expansion of the Lagrangian and by linearizing the constraints.

- The QP gives a direction along which a line search is performed to move primal and dual variables.