Is BPP vs. P a real problem after we know BPP lies in P/poly?

Question

We know (for now about 40 years, thank Adleman, Bennet and Gill) that the inclusion BPP $\subseteq$ P/poly, and an even stronger BPP/poly $\subseteq$ P/poly hold. The "/poly" means that we work non-uniformly (a separate circuit for each input length $n$), while P without this "/poly" means we have one Turing machine for all possible input lengths $n$, even longer than, say, $n$ = the number of seconds to the next "Big Bang".

Question 1: What new would a proof (or disproof) of BPP = P contribute to our knowledge after we know BPP $\subseteq$ P/poly?

Under "new" I mean any really surprising consequences, like collapse/separation of other complexity classes. Compare this with consequences the proof/disproof of NP $\subseteq$ P/poly would deliver.

[ADDED 08.10.2017]: One really surprising consequence of BPP $\not\subseteq$ P would be that, as shown by Impagliazzo and Wigderson, all(!) problems in E = DTIME$[2^{O(n)}]$ would have circuits of size $2^{o(n)}$. Thanks to Ryan for recalling this result.

Question 2: Why we cannot prove BPP = P along similar lines as the proof of BPP/poly $\subseteq$ P/poly?

One "obvious" obstacle is finite vs. infinite domain issue: boolean circuits work over finite domains, whereas Turing machines work over entire set $\{0,1\}^*$ of $0$-$1$ strings of any length. So, to derandomize probabilistic boolean circuits, it is enough to take the majority of independent copies of a probabilistic circuit, and to apply Chernoff's inequality, together with the union bound. Of course, over infinite domains, this simple majority rule won't work.

But is this (infinite domain) a real "obstacle"? By using results from statistical learning theory (VC dimension), we already can prove that BPP/poly $\subseteq$ P/poly holds also for circuits working over infinite domains, like arithmetic circuits (working over all real numbers); see e.g. this paper of Cucker at al. When using a similar approach, all we would need is to show that the VC dimension of poly-time Turing machines cannot be too large. Has anybody seen any attempts to make this latter step?

NOTE [added 07.10.2017]: In the context of derandomization, the VC dimension of a class $F$ of functions $f:X\to Y$ is defined as the maximum number $v$ for which there are functions $f_1,\ldots,f_v$ in $F$ such that for every $S\subseteq\{1,\ldots,v\}$ there is a point $(x,y)\in X\times Y$ with $f_i(x)=y$ iff $i\in S$. I.e. we shatter not the sets of points via functions but rather sets of functions via points. (The two resulting definitions of the VC dimension are related, but exponentially.)

The results (known as uniform convergence in probability) then imply the following: if for each input $x\in X$, a randomly picked function $\mathbf{f}\in F$ (under some probability distribution on $F$) satisfies $\mathrm{Prob}\{\mathbf{f}(x)=f(x)\}\geq 1/2+c$ for a constant $c>0$, then $f(x)$ can be computed on all inputs $x\in X$ as a majority of some $m=O(v)$ (fixed) functions from $F$. See, e.g. Corollary 2 in Haussler's paper. [For this to hold, there are some mild measurability conditions on $F$.]

For example, if $F$ is the set of all polynomials $f:\mathbb{R}^n\to\mathbb{R}$ computable by arithmetic circuits of size $\leq s$, then all polynomials in $F$ have degree at most $D=2^s$. By using known upper bounds on the number of zero-patterns of polynomials (see, e.g. this paper), one can show that the VC dimension of $F$ is $O(n\log D)=O(ns)$. This implies the inclusion BPP/poly $\subseteq$ P/poly for arithmetic circuits.

As far as I am aware most researchers think that BPP=P. That is to say, most people would not be surprised by BPP=P, so I would not expect any really surprising consequences. — Sasho Nikolov, Oct 06 '17 at 17:31
@Saho Nikolov: agree, and - yes: I am also among these "believers". The questions is only: why we cannot turn this "belief" to a "proof", say, by using results of the statistical learning theory? As I recently realized (when trying to show that BPP is in P/poly for dynamic programs) these results are amazingly powerful to do that job. So, I am somewhat confused by being unable to find any attempts to use them to close the BPP vs. P question. — Stasys, Oct 06 '17 at 18:00
Well, the proof of BPP/poly $\subseteq$ P/poly relativizes, but BPP $\subseteq$ DTIME(2^(o(n))) can't even algebrize:$\hspace{.32 in}$ See [[footnote 2] and [part (3) of theorem 4.2]] from this paper. — , Oct 06 '17 at 18:55
@Ricky Demer: could you elaborate what this (can't even algebrize) should imply? I am not asking about "inner difficulties" stemming from all these less and less understandable (in my opinion) artificial classes. The BPP vs. P questions is actually very simple: can coin flipping really help? In statistics, VC dimension is able to eliminate the effect of coin flipping (cf. "uniform convergence in probability"). Disclaimer: I am no expert in "oracles" and similar stuff. — Stasys, Oct 06 '17 at 19:54
It should imply that the main technique for non-relativizing results doesn't work here. $\hspace{1.36 in}$ — , Oct 06 '17 at 20:15
Regarding Q1: a disproof would show surprisingly small circuits for every problem solvable in 2^(O(n)) time, by Impagliazzo-Wigderson (as you probably know?) — Ryan Williams, Oct 06 '17 at 22:28
I am confused by Q2. It seems obvious that the VC dimension of a poly-time TM is infinite. I.e. for any finite set $X \subseteq {0,1}^*$ and any $S \subseteq X$ there exists a polytime TM that accepts the elements of $S$ and rejects the elements of $X \setminus S$. The key thing is that $X$ is finite, so the polytime restriction is basically irrelevant. — Sasho Nikolov, Oct 07 '17 at 06:02
@Ryan: yes, this result is exactly what I looked for in Q1, thanks for recalling it.

@Sasho: the VC dimension $v$ doesn't need to be finite: it can be a growing function $v=v(n)$ on the input length $n$. Then the majority of $v$ independent copies of a probabilistic TM does the job. So what about proving/disproving that, say, for every constant $c$ there is a constant $d$ such that $\mathbf{BPP}[n^c]\subseteq \mathbf{P}[n^d]$? Here $[t]$ means time $O(t)$. What is the VC dimension of

$ \mathbf{P}[n^c]$ for a fixed constant $c$, say, for $c=1$? — Stasys, Oct 07 '17 at 10:52
Re Q1, I think you are essentially asking what uniformity gives us. The answer I think is the distinction between hardware and software. Existence of the right advice (circuit) versus the ability to efficiently find it. Note that in practice it will even matter how efficient it is to find the right circuit. — Kaveh, Oct 07 '17 at 15:20
Re Q2, the inclusion doesn't really have much to do with complexity class and computational power I think, it is about the amount of random bits versus the amount of advice, so I don't think it gives us any information about the nature of efficient computation. — Kaveh, Oct 07 '17 at 15:24
@Kaveh: the hint "amount of random bits vs. the amount of advice" is worth to think on! But in my (layman's) mind, even in questions like P vs. NP, we do not actually care about an "explicit" construction of a (uniform) TM. Such questions only ask about the existence of efficient algorithms. Of course, a construction is a "no doubts" proof of the existence. But there can be some less direct proofs as well. So, the things reduce to extending "existence for each $n$" to showing "existence for all $n$". That is, $\forall\exists$ to $\exists\forall$. — Stasys, Oct 07 '17 at 15:53
I don't understand why you think that VC dimension bounds could get rid of non-uniformity. It still seems like you'd need to do things separately for each input length, and the VC dimension bounds just reduce the size of the advice string, as Kaveh mentioned. — Sasho Nikolov, Oct 07 '17 at 20:21
@Sasho: By identifying binary strings with natural numbers they represent, every TM computes some function $f:\mathbb{N}\to{0,1}$. So, having an upper bound $T$ on the time permitted, we have a well defined class of functions $F$, and can investigate its VC dimension. I think that not the "uniformity" is a real issue, but the fact that the functions computed by TMs may be rather "wild", even for small $T$: programs of TM need not be acyclic and/or be of some "algebraic smell", which makes estimating their VC-dim difficult (unlike, say, for polynomials; see my Note above). — Stasys, Oct 08 '17 at 10:23
Even if you fix the running time, the VC-dim will be infinite. What you can hope to do is to bound the VC-dim of time $T$ bounded TMs running on input size $n$. But if you think about the argument then, you would have to take a majority of potentially different TMs for every $n$: nonuniformity. — Sasho Nikolov, Oct 08 '17 at 20:42
@Sasho: thanks, I've got it. We can (at least potentially) upper-bound the VC dimension in terms of the running time $t(n)$ of TMs, but the derandomization will not be uniform: we have to take majorities of a larger and larger number of TMs. So, uniformity is indeed an issue. — Stasys, Oct 09 '17 at 16:24

usul · Accepted Answer · 2017-10-06T21:07:52.150

Not sure how much of an answer this is, I'm just indulging in some rumination.

Question 1 could be equally asked about P $\neq$ NP and with a similar answer -- the techniques/ideas used to prove the result would be the big breakthrough more so than the conclusion itself.

For Question 2 I want to share some background and a thought. Pretty much all the techniques and ideas we have for BPP=P, as far as I'm aware, go via "derandomization": Given any probabilistic polytime Turing Machine, construct a PRG to feed it a bunch of deterministically chosen bits instead of random ones, such that its behavior is very similar to its behavior on truly random bits. So with good enough pseudorandom generators, we get BPP=P. (Goldreich's "World of BPP=P" gives evidence that any proof of BPP=P must equate to this.)

This is pretty much along the lines of BPP $\subseteq$ P/poly, except there, the PRG is the advice string which is produced by magic. Perhaps the best answer to your Question 2 is that in P we have no magic and must come up with the advice string ourselves. Derandomization is also the idea behind the 2004 result SL=L, using tools like expander graphs.

Now consider what such a proof would imply for just one particular algorithm, the Miller-Rabin primality test. It would show the existence of some deterministic generator that picks out a sequence of integers to feed to the Miller-Rabin primality test such that, if and only if all the integers passed, then the original number was prime.

As I understand it (though I am no expert), the question of whether such a list exists and how small the numbers in it can be (in particular if it sufficices to check all the numbers below some bound) seems quite a deep question in number theory and is closely related to proving forms of the generalized Riemann Hypothesis. See this question. I don't think there's a formal implication here, but it doesn't seem like something we expect to get next week as an accidental miniature corollary of a much more general PRG construction.

Interesting thoughts! Oded's paper suggests that Q2 indeed reduces to "existence vs. construction" of PRGs. In derandomization via VC dimension, algorithmic aspects are entirely ignored. — Stasys, Oct 07 '17 at 15:12
Thanks to all (Kaveh, Ricky, Ryan, Sasho and "usul"): I've learned a lot from your comments. "Uniformity" was never an issue in my life, hence the naivity of my questions. I am accepting the answer of "usul". Complemented by very interesting remarks of Kaveh, Ricky, Ryan and Sasho, this answers my both questions. — Stasys, Oct 09 '17 at 19:13

Is BPP vs. P a real problem after we know BPP lies in P/poly?

1 Answers1