Yao's Minimax Principle on Monte Carlo Algorithms

Question

The celebrated Yao's Minimax Principle states the relation between distributional complexity and randomized complexity. Let $P$ be a problem with a finite set $\mathcal{X}$ of inputs and a finite set $\mathcal{A}$ of deterministic algorithm to solve $P$. Also let $\mathcal{D}$ denote the input distribution and let $\mathcal{R}$ denote the probability distribution on $\mathcal{A}$. Then the principle states $$\min_{A\in\mathcal{A}}\quad\mathbb{E} cost(A,\mathcal{D}) \leq \max_{x\in\mathcal{X}}\quad\mathbb{E} cost(\mathcal{R},x) \quad\quad\text{for all $\mathcal{D}$ and $\mathcal{R}$}.$$ This proof directly follows from von Neumann's minimax theorem for zero-sum games.

Mostly Yao's principle deals with Las Vegas algorithms only, but it can be generalized to Monte Carlo algorithms as follows. $$\frac12 \min_{A\in\mathcal{A}}\quad\mathbb{E} cost_{2\epsilon}(A,\mathcal{D}) \leq \max_{x\in\mathcal{X}}\quad\mathbb{E} cost_{\epsilon}(\mathcal{R},x)\quad\quad\text{for all $\mathcal{D}$, $\mathcal{R}$ and $\epsilon\in [0,1/2]$}$$ where $cost_\epsilon(\cdot,\cdot)$ denotes the cost of Monte Carlo algorithms that errs probability at most $\epsilon$.

In Yao's original paper, the relation for Monte Carlo algorithms is given at Theorem 3 without proof. Any hint for proving it?

score 8 · Answer 1 · edited Dec 12 '17 at 09:31

I'll give it a try on this. I'm going to use Yao's original notation. This way it will be easier to contrast with his paper and his definitions.

Let $\mathcal{I}$ be a finite set of inputs, and let $\mathcal{A}_0$ be a finite set of deterministic algorithms that may fail to give a correct answer for some inputs. Let also $\epsilon(A,x)=0$ if $A$ gives the correct answer for $x$, and $\epsilon(A,x)=1$ otherwise. Also denote by $r(A,x)$ the number of queries made by $A$ on input $x$, or equivalently, the depth of $A$'s decision tree.

Average Cost: Given a probability distribution $d$ on $\mathcal{I}$, the average cost of an algorithm $A\in \mathcal{A}_0$ is $C(A,d)=\sum_{x\in\mathcal{I}} d(x)\cdot r(A,x)$.

Distributional Complexity: Let $\lambda\in[0,1]$. For any distribution $d$ on the inputs, let $\beta(\lambda)$ be the subset of $\mathcal{A}_0$ given by $\beta(\lambda)=\{A : A\in \mathcal{A}_0, \sum_{x\in\mathcal{I}} d(x)\cdot \epsilon(A,x)\leq \lambda\}$. The distributional complexity with error $\lambda$ for a computational problem $P$ is defined as $F_{1,\lambda}(P)=\max_{d} \min_{A\in \beta(\lambda)} C(A,d)$.

$\lambda$-tolerance: A distribution $q$ on the family $\mathcal{A}_0$ is $\lambda$-tolerant if $\max_{x\in \mathcal{I}} \sum_{A\in\mathcal{A}_0} q(A)\cdot \epsilon(A,x)\leq \lambda$.

Expected Cost: For a randomized algorithm $R$, let $q$ be a probability distribution that is $\lambda$-tolerant on $\mathcal{A}_0$. The expected cost of $R$ for a given input $x$ is $E(R,x)=\sum_{A\in \mathcal{A}_0} q(A)\cdot r(A,x)$.

Randomized Complexity: Let $\lambda\in[0,1]$. The randomized complexity with error $\lambda$ is $F_{2,\lambda}=\min_R \max_{x\in\mathcal{I}} E(R,x)$.

Now we are ready to go into business. What we want to prove is given a distribution $d$ on the inputs and a randomized algorithm $R$ (i.e., a distribution $q$ on $\mathcal{A}_0$)

Yao's Minimax Principle for Montecarlo Algorithms \begin{equation}\max_{x\in\mathcal{I}} E(R,x)\geq \frac{1}{2}\min_{A\in \beta(2\lambda)} C(A,d) \end{equation} for $\lambda\in[0,1/2]$.

I will follow an approach given by Fich, Meyer auf der Heide, Ragde and Wigderson (see Lemma 4). Their approach does not yield a characterization for Las Vegas algorithms (only the lower bound), but it is sufficient for our purposes. From their proof, it easy to see that for any $\mathcal{A}_0$ and $\mathcal{I}$

Claim 1. $\max_{x\in \mathcal{I}} E(R,x)\geq \min_{A\in \mathcal{A}_0} C(A,d)$.

To get the correct numbers there, we'll do something similar. Given that the probability distribution $q$ given by the randomized algorithm $R$ is $\lambda$-tolerant on $\mathcal{A}_0$ we have that \begin{align*} \lambda &\geq \max_{x\in \mathcal{I}}\left\{ \sum_{A\in\mathcal{A}_0} q(A)\cdot \epsilon(A,x) \right\}\\ &\geq \sum_{x\in\mathcal{I}} d(x) \sum_{A\in \mathcal{A}_0} q(a)\cdot \epsilon(A,x)\\ &= \sum_{A\in \mathcal{A}_0} q(a)\sum_{x\in\mathcal{I}} d(x) \cdot \epsilon(A,x)\\ &\geq \min_{A\in \mathcal{A}_0}\left\{ \sum_{x\in\mathcal{I}} d(x) \cdot \epsilon(A,x) \right\}. \end{align*} If we replace the family $\mathcal{A}_0$ with $\beta(2\lambda)$ we see that

\begin{align*} \lambda &\geq \max_{x\in \mathcal{I}}\left\{ \sum_{A\in\mathcal{A}_0} q(A)\cdot \epsilon(A,x) \right\}\\ &\geq \max_{x\in \mathcal{I}}\left\{ \sum_{A\in\beta(2\lambda)} q(A)\cdot \epsilon(A,x) \right\}\\ &\geq \sum_{x\in\mathcal{I}} d(x) \sum_{A\in \beta(2\lambda)} q(a)\cdot \epsilon(A,x)\\ &= \sum_{A\in \beta(2\lambda)} q(a)\sum_{x\in\mathcal{I}} d(x) \cdot \epsilon(A,x)\\ &\geq \min_{A\in \beta(2\lambda)}\left\{ \frac{1}{2}\sum_{x\in\mathcal{I}} d(x) \cdot \epsilon(A,x) \right\}, \end{align*}

where the second inequality follows because $\beta(2\lambda) \subseteq \mathcal{A}_0$, and the last inequality is given by the definition of $\beta(2\lambda)$ where the summation divided by 2 cannot be greater than $\lambda$. Hence, \begin{equation}\max_{x\in \mathcal{I}}\left\{ \sum_{A\in\mathcal{A}_0} q(A)\cdot \epsilon(A,x) \right\}\geq\frac{1}{2} \min_{A\in \beta(2\lambda)}\left\{ \sum_{x\in\mathcal{I}} d(x) \cdot \epsilon(A,x) \right\}. \end{equation}

By noting that $\epsilon$ maps to $\{0,1\}$ and $r$ maps to $\mathbb{N}$ and Claim 1 above, now we can safely replace the function $\epsilon$ in the inequality above by $r(A,x)$ to obtain the desired inequality.

Is there a short explanation for where the factor of 2 comes from? — Robin Kothari, Oct 11 '12 at 02:21
in short, it comes from the definition of $\beta(2\lambda)$. The summation in the definition divided by 2 is at most $\lambda$. — Marcos Villagra, Oct 11 '12 at 08:25
something seems strange to me. by definition, $\max_{A \in \beta(2\lambda))} \left{\frac{1}{2} \sum_{x \in \mathcal{I}}{d(x), \epsilon(A,x)}\right} \leq \lambda$ so why the min? — Sasho Nikolov, Oct 11 '12 at 16:39
and i don't understand the last sentence. how did you make an entire argument about $\epsilon$ and then replaced it with $r$? — Sasho Nikolov, Oct 11 '12 at 16:40
regarding your second question, I agree that its extremely hand-wavy. First, $\max_{x\in \mathcal{I}} \left{\sum_{A\in\beta(2\lambda)} q(A)\cdot r(A,x)\right} \geq \max_{x\in \mathcal{I}} \left{\sum_{A\in\beta(2\lambda)} q(A)\cdot \epsilon(A,x)\right}$, and the same for the right side. Then, by Claim 1 (i.e., doing computations similar to the 2 sets of inequalities above ) $\max_{x\in \mathcal{I}} \left{\sum_{A\in\mathcal{A}0} q(A)\cdot r(A,x)\right} \geq \frac{1}{2} \min{A\in\beta(2\lambda)} \left{\sum_{x\in \mathcal{I}} d(x)\cdot r(A,x)\right}$ — Marcos Villagra, Oct 12 '12 at 04:58
Now, I could have directly argue without recurring to $\epsilon$ like $\max_{x\in \mathcal{I}} \left{\sum_{A\in\mathcal{A}0} q(A)\cdot r(A,x)\right} \geq \min{A\in\beta(2\lambda)} \left{\sum_{x\in \mathcal{I}} d(x)\cdot r(A,x)\right}\geq \frac{1}{2} \min_{A\in\beta(2\lambda)} \left{\sum_{x\in \mathcal{I}} d(x)\cdot r(A,x)\right}$. However, I wanted to give a good excuse as to why is 1/2 and not (say) 2/3. If it were 2/3 it would violate the case when we have $\epsilon$ instead of $r$. — Marcos Villagra, Oct 12 '12 at 04:58
I still don't understand your answer to my first question. Your derivation is ok, but it is also true that the fact holds quite trivially if you replace the min with a max. — Sasho Nikolov, Oct 13 '12 at 03:14
I like your thorough notation, but I think you're complicating this a bit too much. All you need to notice is that the $q$-measure of $\beta(2\lambda)$ is at least 1/2 by Markov's inequality. — Sasho Nikolov, Oct 13 '12 at 03:18

Sasho Nikolov · Accepted Answer · 2012-10-13T04:55:26.463

This is just an extended comment on Marcos's answer, using his notation. I am not quite able to follow the details of his argument, and the one below is pretty short and easy.

By averaging, $$\sum_A{q(A)\sum_x{d(x)\epsilon(A, x)}} = \sum_x{d(x)\sum_A{q(A)\epsilon(A, x)}} \leq \lambda.$$

The fact above and Markov's inequality imply $\sum_{A \in \beta(2\lambda)}{q(A)} \geq 1/2$.

So we get:

\begin{align*} \max_x \sum_A{q(A)r(A,x)} &\geq \sum_x{d(x)\sum_A{q(A)r(A, x)}}\\ &= \sum_A{q(A)\sum_x{d(x)r(A, x)}}\\ &\geq \sum_{A \in \beta(2\lambda)}{q(A)\sum_x{d(x)r(A, x)}}\\ &\geq \left(\sum_{A \in \beta(2\lambda)}{q(A)}\right) \min_{A \in \beta(2\lambda)}{\sum_x{d(x)r(A, x)}}\\ &\geq \frac{1}{2}\min_{A \in \beta(2\lambda)}{\sum_x{d(x)r(A, x)}} \end{align*}

Yao's Minimax Principle on Monte Carlo Algorithms

2 Answers2