Convex Conditions for Strong Convexity

An important concept in online learning and convex optimization is that of strong convexity: a twice-differentiable function $f$ is said to be strongly convex with respect to a norm $\|\cdot\|$ if

$z^T\frac{\partial^2 f}{\partial x^2}z \geq \|z\|^2$

for all $z$ (for functions that are not twice-differentiable, there is an analogous criterion in terms of the Bregman divergence). To check strong convexity, then, we basically need to check a condition on the Hessian, namely that $z^THz \geq \|z\|^2$. So, under what conditions does this hold?

For the $l^2$ norm, the answer is easy: $z^THz \geq \|z\|_2^2$ if and only if $H \succeq I$ (i.e., $H-I$ is positive semidefinite). This can be shown in many ways, perhaps the easiest is by noting that $z^THz-\|z\|_2^2 = z^T(H-I)z$.

For the $l^{\infty}$ norm, the answer is a bit trickier but still not too complicated. Recall that we want necessary and sufficient conditions under which $z^THz \geq \|z\|_{\infty}^2$. Note that this is equivalent to asking that $z^THz \geq (z_i)^2$ for each coordinate $i$ of $z$, which in turn is equivalent to $H \succeq e_ie_i^T$ for each coordinate vector $e_i$ (these are the vectors that are 1 in the $i$th coordinate and 0 everywhere else).

More generally, for any norm $\|\cdot\|$, there exists a dual norm $\|\cdot\|*$ which satisfies, among other properties, the relationship $\|z\| = \sup{\|w\|* = 1} w^Tz$. So, in general, $z^THz \geq \|z\|^2$ is equivalent to asking that $z^THz \geq (w^Tz)^2$ for all $w$ with $\|w\|* = 1$. But this is in turn equivalent to asking that

$H \succeq ww^T$ for all $w$ such that $\|w\|_* = 1$.

In fact, it suffices to pick a subset of the $w$ such that the convex hull consists of all $w$ with $\|w\|_* \leq 1$; this is why we were able to obtain such a clean formulation in the $l^{\infty}$ case: the dual norm to $l^{\infty}$ is $l^1$, whose unit ball is the simplex, which is a polytope with only $2n$ vertices (namely, each of the signed unit vectors $\pm e_i$).

We can also derive a simple (but computationally expensive) criterion for $l^1$ strong convexity: here the dual norm is $l^{\infty}$, whose unit ball is the $n$-dimensional hypercube, with vertices given by all $2^n$ vectors of the form $[ \pm 1 \ \cdots \ \pm 1]$. Thus $z^THz \geq \|z\|_1^2$ if and only if $H \succeq ss^T$ for all $2^n$ sign vectors $s$.

Finally, we re-examine the $l^2$ case; even though the $l^2$-ball is not a polytope, we were still able to obtain a very simple expression. This was because the condition $H \succeq I$ manages to capture simultaneously all dual vectors such that $w^Tw \leq 1$. We thus have the general criterion:

Theorem. $H \succeq M_jM_j^T$ for $j = 1,\ldots,m$ if and only if $H$ is strongly convex with respect to the norm $\|\cdot\|$ whose dual unit ball is the convex hull of the transformed unit balls $M_j\mathcal{B}_j$, $j = 1, \ldots, m$, where $\mathcal{B}_j$ is the $l^2$ unit ball whose dimension matches the number of columns of $M_j$.

Proof. $H \succeq M_jM_j^T$ if and only if $z^THz \geq \max_{j=1}^m \|M_j^Tz\|_2^2$. Now note that $\|M_j^Tz\|2 = \sup{w \in \mathcal{B}j} w^TM_j^Tz = \sup{w' \in M_j\mathcal{B}j} (w')^Tz$. If we define $\|z\| = \max{j=1}^m \|M_j^Tz\|_2$, it is then apparent that the dual norm unit ball is the convex hull of the $M_j\mathcal{B}_j$.

Jacob Steinhardt

Jacob Steinhardt


Comments

Sign in to join the conversation.