Notes on Convex Optimization

Devin Z
6 min readJan 30, 2023

Takeaways from related lecture slides and notes.

Milpitas, January 23, 2023
  • An affine set is one that is closed under affine combinations.
    - The line passing through any two distinct points in the set is also in the set.
  • A convex set is one that is closed under convex combinations.
    - The line segment between any two distinct points in the set is also in the set.
    - The convex hull of a set contains all the convex combinations of the points in the set.
    - There exists a supporting hyperplane at every boundary point of a convex set.
  • Operations that preserve set convexity:
    - The intersection of convex sets is convex.
    - The image of a convex set under an affine function is convex.
  • A differential function with a convex domain is convex iff the first-order approximation at any point in the domain is a global under-estimator.
  • A twice differential function with a convex domain is convex iff the Hessian matrix at any point in the domain is positive semi-definite.
  • Jensen’s inequality: f(E[z]) ≤ E[f(z)]
  • Operations that preserve function convexity:
    - Nonnegative weighted sum
    - Pointwise maximum
    - Composition with affine functions (e.g. f(Ax+b))
  • A convex optimization is minimization of a convex function over a convex feasible set.
  • A standard optimization problem is a convex optimization iff
    - The objective function is convex.
    - The inequality constraint functions are convex.
    - The equality constraint functions are affine.
Standard Optimization Problem (cited from [1])
  • Any local optimal point of a convex optimization is globally optimal.
  • A linear program (LP) is a convex optimization with affine objective and constraint functions.
  • A quadratic program (QP) is an optimization with a quadratic objective function and affine constraint functions.
    - A QP is convex iff the quadratic term in the objective function is convex (positive semi-definite).
  • The Lagrangian of an optimization problem (not necessarily convex) is a weighted sum of the objective and constraint functions.
    - The λs and νs are dual variables (a.k.a. Lagrange multipliers), which indicate how sensitive the optimum is to the corresponding constraints.
    - The primal objective function can be thought of as the supremum of the Lagrangian over dual variables, so that it would go to infinity as the primal variables leave the feasible set.
Lagrangian (cited from [1])
  • The Lagrange dual form is the infimum of the Lagrangian over the primal optimization variables.
    - The dual form is always concave.
    - When λs are all nonnegative, the dual form is a parameterized lower bound on the primal optimal.
Lagrange dual form (cited from [1])
  • The Lagrange dual problem is maximization of the Lagrange dual form under the constraints that the λs are all nonnegative.
    - The dual problem is always a convex optimization.
    - The primal optimum is lower bounded by the dual optimum, which is also known as weak duality.
    - Strong duality is situations where the primal optimum and the dual optimum are equal.
    - Transformation of a primal problem may lead to wildly different forms of dual problems.
  • Slater’s constraint qualification: strong duality holds for a convex optimization as long as it is strictly feasible.
    - Strong duality holds for LPs and convex QPs as long as they are feasible.
  • Karush-Kuhn-Tucker (KKT) conditions:
    - The primal constraints hold.
    - The dual constraints hold (λs are all nonnegative).
    - Complementary slackness holds (i.e. either a dual constraint or the corresponding primal constraint is active).
    - The gradient of the Lagrangian w.r.t. the primal variable vanishes (i.e. the gradient of the primal objective function is zero in directions that are “free”).
The 4th KKT condition (cited from [1])
  • KKT conditions are necessary conditions for strong duality of general optimizations (with differentiable objective and constraint functions).
    - For unconstrained optimizations, this is equivalent to the stationarity of the primal objective function.
    - For equality-constrained optimizations, this is a set of equations (about x and ν) known as the KKT system.
  • KKT conditions are sufficient and necessary conditions for strong duality when it comes to convex optimizations.
    - If a set of primal and dual variables satisfy the KKT conditions, they are the solution to the convex optimization.
  • General thoughts on solving optimizations:
    - Approximate a non-linear function as a linear function for root finding.
    - Reduce an unconstrained optimization to root-finding for the gradient.
    - Reduce an optimization with equality constraints to an unconstrained optimization for the Lagrangian.
    - Reduce an optimization with inequality constraints to either (1) an optimization with a subset of active constraints or (2) an unconstrained optimization augmented with barrier functions.
    - Reduce a non-convex optimization to approximate convex optimizations.
  • Newton’s method, as a root-finding algorithm, simply finds the closest zero point of the gradient.
    - It approximates the objective function as a quadratic function and thus the gradient as a linear function.
    - Without convexity guarantees, it may stuck at local minimums or even go uphills to maximums.
    - A cheap workaround is regularization, i.e. repeatedly adding the identity matrix to the inverse of the Hessian until it’s positive definite.
    - Another problem is overshooting, where the step size is too large for the quadratic approximation to work out.
    - One simple solution to that is fine tuning the step size while keeping the step direction unchanged (i.e. line search).
    - Gauss-Newton: The constraint curvature is often ignored in the Hessian of the Lagrangian to trade more iterations for fewer computation per iteration.
  • Unconstrained optimization methods:
    - Line search methods: at each iteration, choose a descent direction based on a local approximation of the objective function and then optimize the step size along that direction.
    - Trust region methods: at each iteration, define a region in which the objective function can be approximated by a model function and attempt to get sufficient descent in the value of the model function.
  • Constrained optimization methods:
    - Active-set methods: given a set of active constraints at each iteration, determine the direction of the next step by solving an equality-constrained optimization (i.e. the KKT system) and supplement the active set with blocking constraints.
    - Interior-point methods: augment the objective function with weighted barrier functions that approximate the inequality constraints and iteratively solve equality-constrained problems as the weight decays.
Use of Logarithmic Barriers (weighted by 1/t) (cited from [1])
  • Augmented Lagrangian:
    - Assume all the constraints are inequality constraints.
    - Augment the Lagrangian with penalty terms that penalize it only when the constraints have already been violated (as opposed to barrier functions).
    - At each iteration, hold the dual variables fixed and solve the primal variables to minimize the augmented Lagrangian.
    - Update dual variables by sucking in the active constraints.
    - Exponentially increase the weight of the penalty term.
    - At some point, the dual variables converge to the optimums so that the weight of the penalty term no longer needs to increase.
Augmented Lagrangian
  • Sequential convex optimization:
    - Maintain an estimate of the solution and a trust region around it.
    - Make convex approximation of the objective function and the inequality constraints in the trust region.
    - Make affine approximation of the equality constraints in the trust region.
    - Solve the approximate convex optimization to get a new estimate.

--

--