Proving set membership in less than $\log{N}$ bandwidth

Question

I have an unchanging set of unique items, and I want my server to show one of those items to a client, then cryptographically prove that the item is part of the set.

How I do this currently:

Generate a Merkle tree containing all of the items.
Give the client the Merkle root, (32 bytes) out of band.
Give the client the item, then the branch leading to the Merkle root.
The client hashes the item, then hashes each Merkle branch, starting at the bottom.
If the hash at the end equals the Merkle root, the item is in the set.

The problem with this is that it takes $2\cdot 32 \cdot \log_2{N}$ bytes of bandwidth (where $N$ is the number of items in the set.) Is it possible to do this is in a smaller space?

The scheme needs one additional property, which is that once you have all of the items, it must be possible to determine that you have all of the items in the set.

Other factors:

The set isn't private - it's okay if the scheme reveals information about other members of the set.

It looks to me like that takes $: 2\hspace{-0.04 in}\cdot \hspace{-0.04 in}k\hspace{-0.04 in}\cdot \hspace{-0.04 in}\log(N\hspace{.02 in}) :$ space (where k is the security parameter), rather than log(N) space. — , Aug 26 '15 at 00:39
@RickyDemer I was eliding constant factors. But it would be $k*log(2,N)$, no? — Nick ODell, Aug 26 '15 at 00:40
No, since the lengths of the hashes should be 2 times the security parameter. $;$ — , Aug 26 '15 at 00:44
@RickyDemer Oh, I misunderstood what you meant by security parameter. — Nick ODell, Aug 26 '15 at 00:45
Also, if you elide the security parameter, then polylog(N) can generally (including in this case) be elided by padding for small N and just showing a break in security for large N. $;$ — , Aug 26 '15 at 00:47
If item size $< 2k$ you can reveal some items instead of intermediary hashes to save space, but it will still be $O(\log N)$ of course. — otus, Aug 26 '15 at 07:10
In order to answer this question, you need to better define how much storage you allow the client in the init phase. Note that the Merkle tree solution requires the client to hold only a single hash output. Are you willing to allow more? Also, are you willing to have some non negligible error probability or not? — Yehuda Lindell, Aug 26 '15 at 13:58
@YehudaLindell It can be larger, but not significantly. Giving the client the the hashes of all of the items in the init phase is right out. (see edit.) Also, are you willing to have some non negligible error probability or not? No, it needs to operate deterministically - one client testing whether an item is a member must come to the same conclusion as another client. — Nick ODell, Aug 26 '15 at 21:49
Note that what you're doing doesn't operate how you said it needs to - a malicious server can simply send an invalid alleged-branch to one client to get that client to come to a different conclusion as a client it acted honestly with. $;$ — , Aug 27 '15 at 02:33
@RickyDemer Well, then the client will know that the server is lying, and that it can't draw any conclusion about whether item X is really in the set. — Nick ODell, Aug 27 '15 at 02:39
There could conceivably be probabilistic schemes which also have that property. $;$ — , Aug 27 '15 at 02:42
@RickyDemer I don't follow. Wouldn't that not be probabilistic? — Nick ODell, Aug 27 '15 at 06:15
No; it could be that for honest servers, the client will accept with certainty, but malicious servers can make the clients have an essentially-arbitrary probability of accepting. $:$ (I don't know of any candidate schemes for which that would be helpful.) $;;;;$ — , Aug 27 '15 at 06:22

Alin Tomescu · Answer 1 · 2018-01-31T15:07:12.733

Depending on your trust assumptions about this server, you might be able to use cryptographic accumulators which provide constant-sized (non)membership proofs.

However, as far as I know, no efficient strong accumulator scheme has been developed yet. Most accumulator constructions rely on the RSA assumption where the server knows the factorization $n = pq$ of the modulus $n$ and, as a result, can compute fake membership proofs for any element $x$ by taking the accumulator $acc$ and computing a fake proof $mem = {acc}^{x^{-1} \pmod {\phi(n)}}$. The proof is verified by checking that ${mem}^x = acc$, which is true because the server faked the proof by inverting $x$.

This happens because the server can easily compute $\phi(n) = (p-1)(q-1)$ and invert $x$.

To fix this problem, a trusted setup phase is necessary that generates a modulus $n$ of unknown factorization and gives it to the server, without giving the server the factors $p$ and $q$.

Also note that pairing-based accumulators need a trusted setup phase. However, I suspect they would be faster in practice than RSA, if you want non-membership proofs!

You might find the following accumulator papers useful:

The following paper proposes using a certain kind of group called a class group to construct a strong accumulator. I am not sure how it constructs (non)membership proofs though.

Secure Accumulators from Euclidean Rings without Trusted Setup

score 3 · Answer 2 · answered Jan 24 '19 at 15:23

This is in continuation to Alin's answer, describing accumulators in the context of class groups.

Class groups provide a setting where computing roots is computationally difficult. That is, given $x \in \mathbb{Z}$ and $g^x \in G$, it is computationally infeasible to compute $g$. This is the same setting provided by RSA modulus, but with one key difference: there is no trusted setup.

Accumulator are built in the following way. Let $S$ be the set of items you have, and assume all the items are prime numbers. Pick a random element $g$ in the group $G$. With high probability this element generates a large cyclic subgroup of $G$, but let's leave that aside for now. Define the accumulator of the set $S$ by $w_S = g^{\prod_{s\in S} s}$, and make this group element known to the client. Now, in order to prove to a client that some $x$ is in $S$, simply compute and send $w_{x,S} = g^{\prod_{s\in S\setminus \{x\}} s}$. The client can verify the computation by checking that $(w_{x,S})^x = w_S$.

The cryptographic assumption (i.e. that computing roots is computationally infeasible) imply that the prover could not construct the witness $w_{x, S}$ unless $x \in S$. Note that we also use the assumption that the elements in $S$ are prime here: if a composite number $c = ab$ is in $S$, the prover can create a (false) proof that $a$ and $b$ are in $S$.

This scheme can also be used in order to give a constant size proof of membership of many items: to prove $x_1, ..., x_n \in S$, simply construct the witness $g^{\prod_{s\in S \setminus \{x_1, ..., x_n\}} s}$.

Non-membership can be proved as well. Since the items in $S$ are prime, if $x$ is not in $S$ then $x$ and $y=\prod_{s \in S} s$ are co-prime, meaning there exists $a, b \in \mathbb{Z}$ such that $ax+by =1$. Hence, the prover may send the client these $a, b$, and the client in turn can verify $(g^x)^a \cdot w_S^b = g$. Non-membership proofs can be batched as well, using a similar trick.

A good review on class group cryptography is "Buchmann, J. and Hamdy, S., 2001. A survey on IQ cryptography. In Public-Key Cryptography and Computational Number Theory (pp. 1-15)".

Proving set membership in less than $\log{N}$ bandwidth

2 Answers2