What is the conditional min-entropy for diagonal ("classical") matrices?

Question

The conditional min-entropy, discussed e.g. in these notes by Watrous, as well as in this other post, can be defined as $$\mathsf{H}_{\rm min }(\mathsf{X} \mid \mathsf{Y})_{\rho}\equiv -\inf _{\sigma \in \mathsf{D}(\mathcal Y)} \mathsf{D}_{\rm max }\left(\rho \| \mathbb{1}_{\cal X} \otimes \sigma\right), \\ \mathsf D_{\max }(\rho \| Q)\equiv \inf \left\{\lambda \in \mathbb{R}: \rho \leq 2^{\lambda} Q\right\}. $$ Among other things, it can be given a rather direct operational interpretation, at least for classical-quantum states $\rho=\sum_a p_a |a\rangle\!\langle a|\otimes\xi_a$, as $-\log p_{\rm opt}$, with $p_{\rm opt}$ the optimal guessing probability of discriminating between the elements of the ensemble $a\mapsto (p_a,\xi_a)$.

What do these quantities look like for diagonal matrices? For the relative min-entropy I would get $$\mathsf D_{\rm max}(P\|Q)=\max_i \log\frac{p_i}{q_i},$$ with $p_i\equiv P_{ii}$ and $q_i\equiv Q_{ii}$. I'm however less sure about how to compute $\mathsf H_{\rm min}(\mathsf X|\mathsf Y)_\rho$. The problem being that the minimisation is defined over all possible states, not just diagonal ones. To get a quantity which can be seamlessly applied also to classical distributions, I would guess that the $\inf$ should be saturated by diagonal states $\sigma$. Even assuming this to be the case (which would need to be shown anyway), I'd get $$\mathsf H_{\rm min}(\mathsf X|\mathsf Y)_P = -\inf_{\vec q}\log \max_{a,b} \frac{p_{a,b}}{q_b},$$ where $P$ is some bipartite probability distribution, and the $\inf$ is taken over all probability distributions $\vec q$ on the second system.

Assuming these expressions are correct in the first place, is there a simpler approach leading to nicer expressions? Or let's say, expressions that would seem more natural in a purely classical context.

As far as I remember, the classical conditional min-entropy is usually defined as $H_{\min}(A|B) := - \log \sum_b P(B=b) \max_a P(A=a|B=b)$. I think people also define a much more pessimistic version which is $\hat{H}{\min}(A|B) := - \log \max{a,b} P(A=a|B=b)$. — Rammus, Jul 05 '22 at 11:22
that first definition matches nicely with the interpretation in terms of optimal discrimination probability, being $\sum_b p_b \max_a p(a|b)$ the success probability to discriminate the input from the output in the classical case. However, going by the definitions in this post, I'd get the second one, assuming the $\inf$ is saturated chosing $\vec q$ to be the marginal of $P$. Can you suggest a reference using that definition? Or even more generally a reference discussing these kinds of "min-max entropic quantities" in the classical case? — glS, Jul 05 '22 at 11:24
I've most often seen it used in the randomness extraction literature. In https://arxiv.org/abs/1702.08476 both definitions are given Def. 4. — Rammus, Jul 05 '22 at 11:50
Thanks for the question. I was told many years ago about this correspondence but just lazily took it for granted. It was nice to finally see it worked out. — Rammus, Jul 05 '22 at 17:38

Rammus · Accepted Answer · 2022-07-07T08:01:47.640

Long story short: taking $\sigma_B = \rho_B$ is equivalent to taking the worst case min-entropy $$ \hat{H}_{\min}(A|B) = - \log \max_{a,b} P(A=a|B=b)\,, $$ and optimizing over $\sigma_B$ is equivalent to taking the averaged min-entropy (standard) $$ H_{\min}(A|B) = - \log \sum_b P(B=b) \max_a P(A=a|B=b)\,. $$

Sufficient to optimize over classical $\sigma_B$

Firstly, let's think about the optimization over $\sigma$ if $\rho_{AB}$ is a diagonal state. It turns out that indeed, it is sufficient to consider $\sigma_B$ to also be diagonal. To see this note that we can write \begin{equation} \begin{aligned} 2^{- H_{\min}(A|B)} = \min& \quad \mathrm{Tr}[\sigma_B] \\ \mathrm{s.t.}& \quad I_A \otimes\sigma_B \geq \rho_{AB} \\ & \quad \sigma_B \geq 0 \end{aligned} \end{equation} which is an SDP. Now if $\rho_{AB}$ is diagonal in the computational basis $\{|i\rangle \otimes | j\rangle\}$ consider the pinching channel on the $B$ system $\mathcal{P}(X) = \sum_j |j\rangle\langle j | X |j \rangle \langle j |$ which takes only the diagonal part of the matrix (in the computational basis for $B$). Now let $\sigma_0$ be any feasible point of the above SDP, if we define $\sigma_1 = \mathcal{P}(\sigma_0)$ then we get a new feasible point of the SDP with the same objective function because $\mathcal{P}$ is a channel and therefore preserves positive-semidefiniteness. Moreover this new feasible point is a diagonal operator and so it suffices to optimize only over diagonal (classical) $\sigma_B$.

Case 1: $\sigma_B = \rho_B$

If we forego the optimization over $\sigma_B$ and set it to $\rho_B$, we see from your calculations that $$ \begin{aligned} \hat{H}_{\min}(A|B) &= - \log \max_{a,b} \frac{p(a,b)}{p(b)} \\ &= - \log \max_{a,b} P(A=a|B=b) \end{aligned} $$

Case 2: Optimizing over $\sigma_B$ If we take the dual of the above SDP (which is actually a linear program now that everything is diagonal) we get $$ \begin{aligned} 2^{-H_{\min}(A|B)} = \max& \quad \sum_{a,b} \mathrm{Tr}[Y_{AB} \rho_{AB}] \\ \mathrm{s.t.}& \quad 0 \leq Y_{B} \leq I_{B} \\ & \quad Y_{AB} \geq 0 \end{aligned} $$ Note that I've written it in SDP form to reflect how we usually see it with quantum systems but here it is actually an LP and $Y_{AB}$ is a diagonal matrix (or just a vector). Considering this we can rewrite the above optimization as $$ \begin{aligned} 2^{-H_{\min}(A|B)} = \max& \quad \sum_{a,b} Y(a,b) P(a,b) \\ \mathrm{s.t.}& \quad 0 \leq \sum_a Y(a,b) \leq 1 \qquad \text{for all } b \\ & \quad Y(a,b) \geq 0 \qquad \text{for all }a,b \end{aligned} $$ Now take the following feasible point $$ Y(a,b) = \begin{cases} 1 \qquad \text{if }a = \mathrm{argmax}_{a'} P(A=a',B=b) \\ 0 \qquad \text{otherwise} \end{cases} $$ in other words, set $Y(a,b) = 1$ if $a$ is the output for which $P(a,b)$ is maximal otherwise set it to 0 (if multiple outputs are maximal then just pick one of them). You can check that this choice is a valid feasible point of the maximization and it gives an objective value $$ \begin{aligned} \sum_{a,b} Y(a,b) P(a,b) &= \sum_b \max_a P(A=a,B=b) \\ &= \sum_b P(B=b) \max_a \frac{P(A=a,B=b)}{P(B=b)} \\ &= \sum_b P(B=b) \max_a P(A=a|B=b) \end{aligned} $$

To see that this is actually the optimal feasible point consider again the primal problem \begin{equation} \begin{aligned} 2^{- H_{\min}(A|B)} = \min& \quad \mathrm{Tr}[\sigma_B] \\ \mathrm{s.t.}& \quad I_A \otimes\sigma_B \geq \rho_{AB} \\ & \quad \sigma_B \geq 0 \end{aligned} \end{equation} and take $\sigma_B = \sum_b (\max_a P(a,b)) |b \rangle \langle b|$. This is a feasible point and yields the same objective value. Hence by strong duality we must have the true optima is $$ \sum_b P(B=b) \max_a P(A=a|B=b) $$ which is exactly the quantity inside the logarithm of the averaged (standard) $H_{\min}(A|B)$.

I just realized that the "worst case min-entropy" you mention at the beginning here is the same as the "conditional min-entropy" as defined at the beginning of sec II of https://arxiv.org/abs/cs/0608018. Out of curiosity, was your definition here from somewhere specific, or was is it maybe a well-known quantity in some context? The paper above just calls it "conditional min-entropy", but then other papers by the same authors, eg https://arxiv.org/abs/0807.1338, effectively use "conditional min-entropy" for the other definition instead — glS, Sep 16 '23 at 22:05
I assume this is a historical convergence onto the ``correct'' quantity. The former paper is older and the definition it uses it not used much at all. I think over time it was realised that the latter paper's definition was the operationally relevant one. I may be oversimplifying things though. — Rammus, Sep 17 '23 at 19:22

glS · Answer 2 · 2023-09-16T22:13:53.147

$\newcommand{\H}{\mathsf{H}}\newcommand{\Hmin}{\H_{\rm min}}\newcommand{\D}{\mathsf{D}}\newcommand{\Dmax}{\D_{\rm max}}$Consider the max-relative entropy of two probability distributions $P,Q$ as defined by $$\Dmax(P\|Q) = \max_a \log\left(\frac{P_a}{Q_a}\right).$$ I start with this definition because it's the most direct analog to the standard relative entropy, which reads $\D(P\|Q)=\sum_a P_a \log(P_a/Q_a)$. This is equivalent to the other definition in terms of a linear program: $$\Dmax(P\|Q) = \min\{\eta\in\mathbb{R}: \,\, \log(P_a/Q_a)\le \eta\,\, \forall a\} \\ = \min\{\log(\eta): \,\, P\le \eta\, Q\} = \min\{\lambda\in\mathbb{R}: \,\, P\le 2^{\lambda}\, Q\}.$$

Consider now the conditional entropy. For the standard one, $\H(X|Y)_P=\H(XY)_P-\H(Y)_{P_Y}$, we can write it in terms of relative entropy as $$\H(X|Y)_P = - \D(P \| I\otimes P_Y) = \H(XY)_P + \sum_{xy} P_{XY}(x,y)\log (P_Y(y)),$$ and it is not hard to observe that, in general, $\sum_a p_a \log q_a\le \sum_a p_a \log p_a$ for any $Q\ge0$ with $\sum_a q_a=1$, which means that we can also write the conditional entropy as $$\H(X|Y)_P=\max_Q [-\D(P \| I\otimes Q)] = - \min_Q \D(P \| I\otimes Q),\tag 4$$ with the maximum over probability distributions $Q$ on the second space (that is, over all vectors with $Q_a\ge0$ and $\sum_a Q_a=1$).

Now, one can try and define the conditional min-entropy using (4), replacing $\D$ with $\D_{\rm max}$. However, doing so, the fact that the max over $Q$ is achieved by $Q=P_Y$ stops being true. We thus define $$\Hmin(X|Y)_P = -\min_Q \D_{\rm max}(P\|I\otimes Q).$$ Using the characterisation given above for $\Dmax$ in terms of a min, we get $$\Hmin(X|Y)_P = -\min\big\{\log\eta: \,\, \eta\ge0, \,\,P \le \eta(I\otimes Q), \,\, Q\ge0, \,\, \sum_y Q_y=1\big\}.$$ This writing is advantageous because it involves two variables, $\eta\ge0$ and probability distributions $Q$, which can be put together, to only minimise with respect to arbitrary positive vectors $\tilde Q\ge0$ such that $P\le I\otimes \tilde Q$. We can then recover the corresponding $\eta$ because $\sum_y \tilde Q_y=\eta$. In other words, we can write the conditional min entropy as $$\Hmin(X|Y)_P = - \min\left\{ \log\left(\sum_y \tilde Q_y\right): \,\, P_{xy} \le \tilde Q_y \right\}.$$ This writing makes it easy to solve the minimisation problem, with solution $\tilde Q$ such that $\tilde Q_y=\max_x P_{xy}$, and thus we conclude $$\Hmin(X|Y)_P = - \log\left(\sum_y \max_x P_{xy}\right),$$ which equals the optimal probability to discriminate between inputs $x$, if our "measurements" are sampled from the corresponding probability distributions $x\mapsto P(x|y)\equiv P_{xy}/\sum_x P_{xy}$.

also related: https://quantumcomputing.stackexchange.com/q/10294/55 — glS, Aug 16 '22 at 22:04

What is the conditional min-entropy for diagonal ("classical") matrices?

2 Answers2

Linked