Estimating Average in Polynomial Time

Question

Let $f \colon \lbrace 0,1 \rbrace ^ n \to (2^{-n},1]$ be a function. We want to estimate the average of $f$; that is: $\mathbb{E}[f(n)]=2^{-n}\sum_{x\in \lbrace 0,1 \rbrace ^ n}f(x)$.

NOTE: In the OP, the range of f was [0,1]. I changed this a bit for technical reasons. (This should simplify the problem; if not, forget it!)

Let $E$ be the (randomized) estimator algorithm. Assume that $E$ has black-box access to $f$. We denote this by $E^f$.

There are two conditions:

1) Estimator's running time: There exists a single polynomial $p(\cdot)$ such that for all $n$ and all $f$, the running time of the $E^f(1^n)$ is bounded by $\frac{p(n)}{\mathbb{E}[f(n)]}$.

2) Estimator's precision with confidence $\delta$: There exists a single polynomial $q(\cdot)$, such that for all $n$ and all $f$, we have ${1 \over {q(n)}} < \frac{E^f(1^n)}{\mathbb{E}[f(n)]} < q(n)$ with probability at least $\delta$.

NOTE: The confidence δ was not in the OP. The parameter δ is in (0,1), and may depend on n. For instance, it may be 1-1/2^n.

Do such estimators exist?

Background and Motivation

I did not mention my motivation at the beginning since it requires a great deal of background knowledge. Anyway, for the enthusiasts, I describe it briefly: The need for such estimators arise in the context of "Proofs of Ability," as defined in the following article:

Mihir Bellare, Oded Goldreich. Proving Computational Ability, 1992. Unpublished Manuscript.

Specifically, at the bottom of page 5, the authors implicitly assumed the existence of such estimators (There's no mention of precision, and the running time is not precisely defined; yet the context clearly defines everything.)

My first attempt was to read "A Sample of Samplers---A Computational Perspective on Sampling." It pertains to a very similar problem, yet the error probability defined is additive, while ours is multiplicative. (I didn't fully read the paper, maybe it mentions what I need somewhere.)

EDIT (as per Tsuyoshi's request): In fact, the definition of "Proofs of Computational Ability" requires the existence of a "knowledge extractor" whose (expected) running time is $p(n) \over E[f(n)]$. Since we don't know $E[f(n)]$, we want to estimate it; yet this must not change the running time considerably: it should change it up to a polynomial factor. The precision condition tries to capture such requirement.

I cannot understand the precision condition. What prevents the algorithm E from always outputting 1? Did you mean 1/q(n) < (True value)/(Estimated value) < q(n)? — Tsuyoshi Ito, Oct 02 '10 at 16:32
It seems that p(n)=q(n)=O(1) and the trivial algorithm $E^f(1^n)$ that outputs "1" should work. It's running time is O(1), which is bounded by $\frac{p(n)}{\mathbb{E}[f(n)]}$. And it's precision is <=1, which is less than q(n). — Robin Kothari, Oct 02 '10 at 16:35
@Tsuyoshi & Robin: Sorry fellas, I missed one condition in the precision. Check it out now! — Sadeq Dousti, Oct 02 '10 at 16:36
Also, I guess that the estimator is randomized (just because it looks impossible otherwise). Is this the case? Also, if it is, what do the running-time condition and the precision condition exactly require? — Tsuyoshi Ito, Oct 02 '10 at 16:40
@Tsuyoshi: Yes, the estimator is randomized. I edited the question to reflect the meaning of such conditions. — Sadeq Dousti, Oct 02 '10 at 16:49
Just a wild guess: Fix some polynomial p(n). Keep choosing an n-bit string uniformly at random with replacement and summing the value of f at the chosen point until the sum becomes at least p(n). Let T be the number of the tries made, and output p(n)/T. I do not have a provable bound on the expected running time or the precision, but this might work. — Tsuyoshi Ito, Oct 02 '10 at 17:04
I think I don't clearly understand the question. Why a naive sampler with a chernoff bound is not a good estimator ? — Sylvain Peyronnet, Oct 02 '10 at 17:57
@Sylvain: Could you please describe more, or provide a link? (P.S: I know what Chernoff bound is.) — Sadeq Dousti, Oct 02 '10 at 18:59
@Sylvain: How many samples do you take before you stop? If you take a fixed number of samples, you'll get an estimate that will not satisfy the precision condition. — Robin Kothari, Oct 02 '10 at 19:10
@sadeq : I meant pick uniformly at random X inputs, and use a Chernov bound, but as Robin says, it does not satisfy the constraint on the precision. There is a paper of Karp, Luby and Madras (Monte-Carlo algorithms for enumeration problem) with a sampler that is adaptive (by analysis of the variance), maybe it works for the precision but I guess it violates the constraint on the running time. — Sylvain Peyronnet, Oct 02 '10 at 19:50

score 15 · Accepted Answer · edited Sep 13 '11 at 02:18

EDIT: This solves the version of the problem where f only outputs 0 or 1. But I think the solution can be adapted to make it work for the more general case.

Maybe I misunderstood the question, but this doesn't look too hard.

Instead of estimating, the average, let's think of estimating the number of 1s, and call that number k. Let $N = 2^n$. So the average is k/N. You want to estimate this within a polynomial multiplicative factor in time O(N polylog(N)/k).

I think this can be done to within any constant multiplicative factor too. For example, let's say you want to estimate this to within a factor of 2. So the output $k'$ of the algorithm will be between k/2 and 2k.

I'll sketch an algorithm, which should have the appropriate running time. First check if k is between N/2 and N. This is easy, just sample a few random values and if you get more than half 1s, then it's in this interval. So you have a 2-approximation. If not, then check if it is between N/4 and N/2. And so on. Every time you make the interval smaller, it is more costly to estimate whether k lies in that range. But the cost is inversely proportional to how small the interval is.

For example, if you are checking if k is between $N/2^q$ and $2N/2^q$, then you need to make about $O(2^q)$ queries. Anyway, after repeating this procedure enough times, you should get the interval in which k lies. Say k lies between $N/2^q$ and $2N/2^q$. Then k is approximately $N/2^q$. So $2^q$ is about k/N. So in this step we would spend O(k/N) queries. But getting to this step required q other steps, but that's just an extra polylog(N) factor. So the overall running time is O(N polylog(N)/k), for a 2-approximation.

(One would actually have to do error amplification to get decent precision in each step. But that's only an extra polylog factor.)

The reason I like to think of it in this several stage process is that it highlights the process as a guess and check precedure. If someone told you $k$ is between $N/2^q$ and $2n/2^q$, then you could estimate it to even better precision knowing this fact, in the promised time bound. So we need to eliminate the step of being given a guess for $k$. This is done by binary searching over all possible intervals of that type.

To make this work for the case of non-boolean outputs, instead of counting the number of 1s, just sum the values seen. I'll try to find a reference to show that this works rigorously.

(1) Since the function f may take non-integral values, you probably want to use the sum of the values instead of the number of 1s. (2) Do we have to estimate stage by stage? I am guessing that we can do this in a single stage, just repeating until the sum exceeds a fixed polynomial. See also my comment to the question. — Tsuyoshi Ito, Oct 02 '10 at 17:08
Oh, I didn't notice the range is [0,1]. I thought it was {0,1}. But I guess the same procedure works. Maybe we can reduce one problem to the other since we can "count" the number of 1s in a particular position of the binary representation of the output with enough precision. About (2), I think your procedure is equivalent. I think of it this way since it feels like a guess-and-check process, i.e., given a lousy estimate of k, get a better one. I'll add this into my answer. — Robin Kothari, Oct 02 '10 at 17:18
I agree that the two algorithms are essentially the same. Also, as for [0,1] and {0,1}, your algorithm probably works as stated after replacing each evaluation of a non-integral value f(x) by a coin flip (1 wp f(x) and 0 wp 1−f(x)). — Tsuyoshi Ito, Oct 02 '10 at 17:24
@Robin: Thanks for the answer. Something is also unclear for me: You said: "Just sample a few random values and if you get more than half 1s, then it's in this interval." I believe this must be quantified: How many samples results in what precision? (I changed OP to consider such confidence. Otherwise it would be impossible to design the required sampler!) — Sadeq Dousti, Oct 02 '10 at 21:53
@Sadeq: that's a chernoff bound. if you expect k to be n/2 (a fair coin, for example) then you can quickly write down a tail bound for seeing more than n(1+eps)/2 and similarly for the lower bound. — Suresh Venkat, Oct 03 '10 at 05:32
@Sadeq: What Suresh said. I might have been a bit informal in the explanation, since if k is very close to N/2 you might not be able to decide whether it is above or below N/2 using only a constant number of evaluations. But such things don't matter since we only want a very course approximation. As for how many times you might have to repeat, the Chernoff bound will give the exact numbers, but my rule of thumb is that the error will decrease exponentially fast. — Robin Kothari, Oct 03 '10 at 15:21

Warren Schudy · Answer 2 · 2010-10-03T18:57:21.980

Let $f_1,f_2,\ldots$ denote the values of $f$ applied to an infinite sequence of random samples (with replacement) from $\{0,1\}^n$. Let $k$ be the least positive integer such that $\sum_{i=1}^k f_i \ge M$ for some value $M$, perhaps $M=polylog(n)$. I would guess that the estimator $M / k$ should accomplish what you want.

For the analysis you cannot apply Chernoff bounds directly to the random variable $k$ but there's a trick to allow you to use Chernoff anyway. Let $\mu$ denote the unknown expectation $E(f)$. Find constants $k_{low}$ and $k_{high}$ (functions of $\mu$) so that with probability at least $1 - \delta$ we have $\sum_{i=1}^{k_{low}} f_i < M$ and $\sum_{i=1}^{k_{high}} f_i > M$. Those sums of $f_i$s can be bounded using Chernoff. It follows that $k_{low} < k < k_{high}$ with probability at least $1-\delta$ and therefore the estimator $M/k$ is well concentrated.

Estimating Average in Polynomial Time

Background and Motivation

2 Answers2