An important concept in online learning and convex optimization is that of strong convexity: a twice-differentiable function $f$ is said to be strongly convex with respect to a norm $\|\cdot\|$ if $z^T\
Here's a fun counterexample: a function $\mathbb{R}^n \to \mathbb{R}$ that is jointly convex in any $n-1$ of the variables, but not in all variables at once. The function is $f(
I spent the last several hours trying to come up with an efficient algorithm to the following problem: Problem: Suppose that we have a sequence of $l$ pairs of non-negative numbers $(a_1,
While grading homeworks today, I came across the following bound: Theorem 1: If A and B are symmetric $n\times n$ matrices with eigenvalues $\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_
The KL divergence is an important tool for studying the distance between two probability distributions. Formally, given two distributions $p$ and $q$, the KL divergence is defined as $KL(p || q) := \int p(