Suppose that we have a random variable X∈Rd, such that E[XX⊤]=Id×d. Now take k independent Gaussian random variables Z1,…,Zk∼N(0,Id×d), and let J be the argmax (over j in 1, ..., k) of Z⊤jX.
It seems that it should be very hard to predict J well, in the following sense: for any function q(j∣x), the expectation of Ex[q(J∣x)], should with high probability be very close to 1k (where the second probability is taken over the randomness in Z). In fact, Alex Zhai and I think that the probability of the expectation exceeding 1k should be at most exp(−C(ϵ/k)2d) for some constant C. (We can already show this to be true where we replace (ϵ/k)2 with (ϵ/k)4.) I will not sketch a proof here but the idea is pretty cool, it basically uses Lipschitz concentration of Gaussian random variables.
I'm mainly posting this problem because I think it's pretty interesting, in case anyone else is inspired to work on it. It is closely related to the covering number of exponential families under the KL divergence, where we are interested in coverings at relatively large radii (log(k)−ϵ rather than ϵ).