1

Consider a formulation of LWE where we are given either $(x,S x+e)$ or $(x,u)$ --- where $S$ is an $m \times n$ secret/hidden matrix, $x$ is a randomly sampled $n \times 1$ vector, $e$ is an $m \times 1$ Gaussian error vector, and $u$ is a uniformly random sample --- and told to distinguish between these two cases. This should be hard for classical algorithms, according to the post here. Call this problem "reverse-LWE."

I had a few questions about the setting.


Is the distinguishing problem hard without $e$?

Note that in standard LWE, when we are given $(A,A s+e)$ or $(x,u)$, and told to distinguish between the two cases, the problem is easy without the error. We just solve a system of linear equations to get the $n$ entries of $s$.

However, here we need to find $m \times n$ entries of our secret matrix $S$. I do not see how to do that with just $m$ equations.


Consider a variant of the problem, where we are given either $$ \{ (x_1, Sx_1 + e_1), (x_2, Sx_2 + e_2), \ldots, (x_k, Sx_k + e_k)\} ~~\text{or}$$

$$\{ (x_1, u_1), (x_2, u_2), \ldots, (x_k, u_k)\},$$

and told to distinguish between the two cases. Call this problem "reverse LWE with a repeated secret." $k$ is polynomially large in $n$. Is this problem hard?

Note that a hybrid argument (like the one used in one of the answers here) indicates that the problem remains hard. Here is the hybrid $H_i$:

$$H_i = \{ (x_1, Sx_1 + e_1), (x_2, Sx_2 + e_2), \ldots, (x_i, Sx_i + e_i), (x_{i+1}, u_{i+1}) \ldots, (x_{k}, u_{k}) \} .$$

Then, there is a direct way to conclude that if we solve "reverse LWE with a repeated secret," we can solve reverse LWE. Since reverse LWE is hard, our problem should also be hard.

However, I am having difficulty wrapping my head around this fact.

Note that if we do not have the error term, there is a very easy way of distinguishing between the two cases, for $k \geq n+1$. Note that there can be only $n$ linearly independent $x_i$-s. So, the distinguisher just looks for $n$ distinct $x_i$-s in the given samples, notes where the matrix $S$ takes these vectors to, and for the $n+1^{\text{th}}$ distinct sample, uses linearity to first compute where $S$ takes it to, and then checks whether that's consistent with what was given.

Why is it that the error terms make this distinguisher fail? Even with a Gaussian error, because of linear dependence, shouldn't the $n+1^{\text{th}}$ distinct sample be sufficiently concentrated around some value for my distinguisher to succeed?

BlackHat18
  • 377
  • 1
  • 9

1 Answers1

3

The distinguishing problem with a single sample $x$ is impossible.

This is because for any non-zero $x$ and any $u$ there exists an $S$ such that $Sx=u$.

ETA 20220405:

For the broader question of distinguishing $(\mathbf x_i,S\mathbf x_i+\mathbf e_i)$ from $(\mathbf x_i,u_i)$ with unknown $S$, we can write $X_{i,j}$ for the $m\times m$ diagonal matrix with constant diagonal of the $j$th entry of $\mathbf x_i$. Then the rows of the $mn\times km$ matrix $$\left[\begin{matrix} X_{1,1} & X_{2,1} & X_{3,1} &\ldots & X_{k,1}\\ X_{1,2} & X_{2,2} & X_{3,2} &\ldots & X_{k,2}\\ \vdots & \vdots & \vdots & & \vdots\\X_{1,n} & X_{2,n} & X_{3,n} &\ldots & X_{k,n}\\\end{matrix}\right]$$ form a lattice where the vector $((S\mathbf x_1+\mathbf e_1)^T,(S\mathbf x_2+\mathbf e_2)^T,(S\mathbf x_3+\mathbf e_3)^T,\ldots,(S\mathbf x_k+\mathbf e_k)^T)$ is a close vector (the difference vector has components that are the entries of the $\mathbf e_i$). For large $k$, a vector this close is highly unlikely to arise from a uniform distribution. This only tells us that the information for a distinguisher exists; finding such a close vector will be computationally very hard as $n$ grows and the variance of the Gaussian distribution grows.

Daniel S
  • 23,716
  • 1
  • 29
  • 67
  • This makes me very confused. Consider the problem of distinguishing $$ { (x_1, Sx_1), (x_2, Sx_2), \ldots, (x_k, Sx_k)} ~~\text{or}$$

    $${ (x_1, u_1), (x_2, u_2), \ldots, (x_k, u_k)}.$$ This should also be hard then, by a hybrid argument. However, we know this problem is easy for $k > n +1$ by the distinguisher I outlined (checking for linear dependence and noting that there can only be $n$ linearly independent $x_i$-s.) How can both be true?

    – BlackHat18 Apr 04 '22 at 00:23
  • 1
    It's because linear systems have a sharp transition between underdetermined $k<n$ (large family of solutions), determined $k=n$ (exactly one solution) and overdetermined $k>n$ (zero or one solution). An overdetermined system is highly likely to have no solutions so the existence of a single solution is a strong distinguisher. – Daniel S Apr 04 '22 at 08:32
  • I didn't follow. Does it mean that the two cases: $$ { (x_1, Sx_1), (x_2, Sx_2), \ldots, (x_k, Sx_k)} ~~\text{or}$$

    $${ (x_1, u_1), (x_2, u_2), \ldots, (x_k, u_k)},$$ are actually indistinguishable? Doesn't the hybrid argument say that they are?

    – BlackHat18 Apr 04 '22 at 15:46
  • 1
    They're indistinguishable for $k$ linearly independent $x_i$ with $k<n$. Note that unlike my answer to your previous question I cannot produce additional samples as $S$ is secret. – Daniel S Apr 04 '22 at 17:45
  • Thanks! It's clear now. One last question, for the noisy case, what can we say for the $k > n$ case? That is, are $$ { (x_1, Sx_1 + e_1), (x_2, Sx_2 + e_2), \ldots, (x_k, Sx_k + e_k)} ~~\text{or}$$

    $${ (x_1, u_1), (x_2, u_2), \ldots, (x_k, u_k)}$$ distinguishable for $k > n$? I couldn't prove anything as, like you said, reductions that involve producing additional samples do not work.

    – BlackHat18 Apr 04 '22 at 18:31
  • There's a Close Vector Problem that can be set up for large $k$, but judicious choices of $e$ and $n$ should render it thoroughly intractable. – Daniel S Apr 04 '22 at 19:03
  • Might you elaborate about this case (ie, the noisy case for $k > n$) in your answer? – BlackHat18 Apr 04 '22 at 19:19