[Lecture Notes] Linear Quadratic Regulator (LQR)

Devin Z
4 min readAug 5, 2023

--

Lecture notes from CMU 16–745 Optimal Control (Spring 2023).

Los Angeles, May 29, 2023

Three Ways to Solve LQR

Lecture 7 The Linear Quadratic Regulator: Three Ways
  • LQR formulation:
  • LQR variants:
    - finite horizon vs infinite horizon
    - time-varying dynamics vs time-invariant dynamics
    - deterministic vs stochastic
  • Shooting method for discrete-time deterministic optimal control:
    - Initialize the action sequence with random guesses.
    - Do a forward rollout to get the state sequence.
    - Make a backward pass to propagate the Lagrange multipliers and the action sequence.
    - Repeat the process until convergence.
Shooting method for LQR
  • Quadratic programming (QP):
    - Without inequality constraints, this would be an overkill for LQR.
    - This just gives a simple solution, instead of a feedback policy.
    - But this is the general way to go for convex MPC.
LQR as QP
  • Dynamic programming (DP):
    - The action-value function should be quadratic and the cost-to-go be linear.
    - Solve the backward Riccati recurrence for a feedback policy and then perform a forward rollout from the initial state.
    - The feedback policy also works for the stochastic form with independent process noise.
    - Without a simple cost-to-go approximation, this would be intractable for high-dimensional states.
    - This would also be broken in the presence of inequality constraints.
Riccati Recurrence
  • Condition for controllability:
    - The rank of the controllability matrix equals the dimension of the state space.
Controllability Matrix of Linear Time-Invariant Systems
  • Discretization of continuous-time linear dynamics under zero-order hold control inputs:

Related Concepts in Later Lectures¹

  • Differential dynamic programming (DDP):
    - This is a variant of the shooting method for general unconstrained trajectory optimization: forward pass for trajectory rollout + backward pass for action update.
    - Use the 2nd-order Taylor expansion to approximate the cost-to-go function locally.
    - The control policy includes both a feedback term and a feed forward term since Δx can be based off of a non-equilibrium point.
    - As in Newton’s method, regulation needs to be added (to make sure matrices are positive definite) before calculating K and d in the backward pass.
    - To guarantee a decent descent in the cost function, the forward pass should use a line search where all the time steps share the same Armijo factor at a time.
    - iLQR, which iteratively linearizes the dynamics and quadraticizes the cost function, is a Gauss-Newton version of DDP.
DDP
  • Direct collocation:
    - This is an alternative to the shooting method such that states and actions are optimized together without forward rollout.
    - Use cubic splines to fit for the state evolution over time between consecutive knot points.
    - Enforce dynamics as well as inequality constraints on the middle point of each spline segment.
    - Implicit methods, such as backward Euler, tend to have better stability and also save overall calls to dynamics compared with explicit methods.
Direct Collocation
  • Sequential quadratic programming (SQP):
    - At each iteration, approximate the non-convex dual problem as a QP by taking the 2nd-order Taylor expansion of the Lagrangian and by linearizing the constraints.
    - The QP gives a direction along which a line search is performed to move primal and dual variables.

--

--