[Lecture Notes] Linear Quadratic Regulator (LQR)

4 min readAug 5, 2023

--

Lecture notes from CMU 16–745 Optimal Control (Spring 2023).

Los Angeles, May 29, 2023

Three Ways to Solve LQR

Lecture 7 The Linear Quadratic Regulator: Three Ways

LQR formulation:

LQR variants:
- finite horizon vs infinite horizon
- time-varying dynamics vs time-invariant dynamics
- deterministic vs stochastic
Shooting method for discrete-time deterministic optimal control:
- Initialize the action sequence with random guesses.
- Do a forward rollout to get the state sequence.
- Make a backward pass to propagate the Lagrange multipliers and the action sequence.
- Repeat the process until convergence.

Shooting method for LQR

Quadratic programming (QP):
- Without inequality constraints, this would be an overkill for LQR.
- This just gives a simple solution, instead of a feedback policy.
- But this is the general way to go for convex MPC.

LQR as QP

Dynamic programming (DP):
- The action-value function should be quadratic and the cost-to-go be linear.
- Solve the backward Riccati recurrence for a feedback policy and then perform a forward rollout from the initial state.
- The feedback policy also works for the stochastic form with independent process noise.
- Without a simple cost-to-go approximation, this would be intractable for high-dimensional states.
- This would also be broken in the presence of inequality constraints.

Riccati Recurrence

Condition for controllability:
- The rank of the controllability matrix equals the dimension of the state space.

Controllability Matrix of Linear Time-Invariant Systems

Discretization of continuous-time linear dynamics under zero-order hold control inputs:

Related Concepts in Later Lectures¹

Differential dynamic programming (DDP):
- This is a variant of the shooting method for general unconstrained trajectory optimization: forward pass for trajectory rollout + backward pass for action update.
- Use the 2nd-order Taylor expansion to approximate the cost-to-go function locally.
- The control policy includes both a feedback term and a feed forward term since Δx can be based off of a non-equilibrium point.
- As in Newton’s method, regulation needs to be added (to make sure matrices are positive definite) before calculating K and d in the backward pass.
- To guarantee a decent descent in the cost function, the forward pass should use a line search where all the time steps share the same Armijo factor at a time.
- iLQR, which iteratively linearizes the dynamics and quadraticizes the cost function, is a Gauss-Newton version of DDP.

DDP

Direct collocation:
- This is an alternative to the shooting method such that states and actions are optimized together without forward rollout.
- Use cubic splines to fit for the state evolution over time between consecutive knot points.
- Enforce dynamics as well as inequality constraints on the middle point of each spline segment.
- Implicit methods, such as backward Euler, tend to have better stability and also save overall calls to dynamics compared with explicit methods.

Direct Collocation

Sequential quadratic programming (SQP):
- At each iteration, approximate the non-convex dual problem as a QP by taking the 2nd-order Taylor expansion of the Lagrangian and by linearizing the constraints.
- The QP gives a direction along which a line search is performed to move primal and dual variables.

References:

Lecture Notes of CMU 16–745 Optimal Control (Spring 2023)

Notes on Convex Optimization

Takeaways from related lecture slides and notes.

devinz1993.medium.com

Devin Z

Written by Devin Z

认识世界，改造世界

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams