While grading homeworks today, I came across the following bound:

**Theorem 1:** If A and B are symmetric $n\times n$ matrices with eigenvalues $\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_n$ and $\mu_1 \geq \mu_2 \geq \ldots \geq \mu_n$ respectively, then $Trace(A^TB) \leq \sum_{i=1}^n \lambda_i \mu_i$.

For such a natural-looking statement, this was surprisingly hard to prove. However, I finally came up with a proof, and it was cool enough that I felt the need to share. To prove this, we actually need two ingredients. The first is the Cauchy Interlacing Theorem:

**Theorem 2:** If A is an $n\times n$ symmetric matrix and B is an $(n-k) \times (n-k)$ principle submatrix of A, then $\lambda_{i-k}(A) \leq \lambda_i(B) \leq \lambda_i(A)$, where $\lambda_i(X)$ is the ith largest eigenvalue of X.

As a corollary we have:

**Corollary 1:** For any symmetric matrix X, $\sum_{i=1}^k X_{ii} \leq \sum_{i=1}^k \lambda_i(X)$.

**Proof:** The left-hand-side is just the trace of the upper-left $k\times k$ principle submatrix of X, whose eigenvalues are by Theorem 2 bounded by the k largest eigenvalues of X. $\square$

The final ingredient we will need is a sort of "majorization" inequality based on Abel summation:

**Theorem 3:** If $x_1,\ldots,x_n$ and $y_1,\ldots,y_n$ are such that $\sum_{i=1}^k x_i \leq \sum_{i=1}^k y_i$ for all k (with equality when $k=n$), and $c_1 \geq c_2 \geq \ldots \geq c_n$, then $\sum_{i=1}^n c_ix_i \leq \sum_{i=1}^n c_iy_i$.

**Proof:** We have:

$\sum_{i=1}^n c_ix_i = c_n(x_1+\cdots+x_n) + \sum_{i=1}^{n-1} (c_i-c_{i+1})(x_1+\cdots+x_i) \leq c_n(y_1+\cdots+y_n) + \sum_{i=1}^{n-1} (c_i-c_{i+1})(y_1+\cdots+y_i) = \sum_{i=1}^n c_iy_i$

where the equalities come from the Abel summation method. $\square$

Now, we are finally ready to prove the original theorem:

**Proof of Theorem 1:** First note that since the trace is invariant under similarity transforms, we can without loss of generality assume that A is diagonal, in which case we want to prove that $\sum_{i=1}^n \lambda_i B_{ii} \leq \sum_{i=1}^n \lambda_i \mu_i$. But by Corollary 1, we also know that $\sum_{i=1}^k B_{ii} \leq \sum_{i=1}^k \mu_i$ for all k. Since by assumption the $\lambda_i$ are a decreasing sequence, Theorem 3 then implies that $\sum_{i=1}^n \lambda_i B_{ii} \leq \sum_{i=1}^n \lambda_i \mu_i$, which is what we wanted to show. $\square$