3

Does Laplace mechanism work only on non-multiplicative query? For example, suppose a database (an array here) $\mathbf{x} = (x_1, \ldots, x_n)$. Is it possible to do design laplacian mechanism for geometric mean query? Geometric mean query is defined as $q(\mathbf{x}) = (x_1x_2\cdots x_n)^{1/n}$.

My goal is to add a noise sampled from laplacian noise with scale $\Delta/\epsilon$ with $\Delta$ being a sensitivity of $q$ and $\epsilon$ being a parameter chosen by users.

Let $\mathbf{x}'$ be neighboring database where only the last element differs from $\mathbf{x}$. In this case, I have no idea how to calculate the sensitivity, i.e., the maximum difference between $q(\mathbf{x})$ and $q(\mathbf{x}')$.

I'm simply assuming each $x_i \in [a, b]$ for some integer $a$ and $b$.

mallea
  • 213
  • "Laplace mechanism" may be confusing for some users, so it'd help if you clarified what you mean to avoid confusion. – Tim Jun 25 '20 at 07:25
  • If you are talking about privacy, Laplace mechanism would also work for the the geometric mean, yes. – Pohoua Jun 25 '20 at 11:38
  • How can we bound the sensitivity? – mallea Jun 25 '20 at 11:46
  • @Pohoua Can you elaborate on how we can use laplacian mechanism on it? – mallea Jun 26 '20 at 05:49
  • Good point, as for arithmetic mean, you need to assume your data to be bounded to have finite sensitivity of the geometric mean... I detailed a bit in an answer. – Pohoua Jun 27 '20 at 01:43
  • I just saw you edited your question, so your data are bounded. Then $|q(x) - q(x')| \leq |q(x)| + |q(x')| \leq 2 b$ – Pohoua Jun 27 '20 at 01:45
  • @Pohoua The maximum bound $b$ of individual data is applicable to the query $q$. This is because $q$ is a mean, right? – mallea Jun 30 '20 at 05:59
  • Yes. You can easily show that if $ x \leq b$ for all $x$ , then the geometric mean is also less than $b$. Now, I also think that you could refine the sensitivity of $q$ to $b - a$. – Pohoua Jul 01 '20 at 18:20

1 Answers1

2

You can use Laplce mecanism in the same way you would on the arithmetic mean. And, as for the arithmetic mean, for it to be $\varepsilon$-differentially private, you need to suppose that your data is bounded in absolute value by a positive number $M$.

Let $m$ be a laplacian mechanism on the geometric mean query. The pdf of the Laplace distribution is $f(x|\lambda) = \frac{1}{2\lambda}\exp^{\frac{-|x|}{\lambda}}$.

Then $(x_1, \dots, x_n) \mapsto m(x_1, \dots, x_n) =\prod{x_i}^{\frac{1}{n}} + \eta $ where $\eta \sim Laplace(\lambda = \frac{\varepsilon}{2M})$ is an $\varepsilon$-differentially private mechanism.

You can easily show it, by considering two databases $x_1, ..., x_n$ and $x_1, ..., x'_n$ with only one different datapoint (chosen to be the $n$-th one without loss of generality) :
$$ \begin{array}[ccc] \;\frac{P(m(x_1, \dots, x_n) = t)}{P(m(x_1, \dots, x'_n) = t)} &= &\frac{\exp(\lambda\mid t - ( x_1\times\dots \times x_n)^{\frac{1}{n}}\mid)}{\exp(\lambda\mid t - (x_1\times\dots \times x'_n)^{\frac{1}{n}}\mid)}\\ & =& \exp(\lambda \mid t - ( x_1\times\dots \times x_n)^{\frac{1}{n}}\mid - \lambda\mid t - ( x_1\times\dots \times x'_n)^{\frac{1}{n}}\mid)\\ & \leq&\exp(\lambda \mid (x_1\times\dots \times x_n)^{\frac{1}{n}} - (x_1\times\dots \times x'_n)^{\frac{1}{n}} \mid)\\ & \leq & \exp(\lambda \times 2M)\\ & = &\exp(\varepsilon) \end{array} $$

But maybe this isn't the best way to have an $\varepsilon$-differentially private geometric mean. Indeed the mechanism above could output a negative value (!). A better way could be to compute the arithmetic mean of the logarithm of the data, make it $\varepsilon$-differentially private with Laplace mechanism, and then take the exponential of the result. This would make it impossible to have a negative output of the mechanism, and this would still be $\varepsilon$-differentially private (since this is preserved by deterministic transformation).

You cannot excape from the requirement of boundedness of data. In practice extreme data are trimmed to a given value (chosen before looking at the data!).

I hope this helped.

Pohoua
  • 2,548
  • Thank you for your answer! How can we trim data before see them? – mallea Jun 30 '20 at 05:52
  • 1
    Well depending on the nature of your data, you fix an arbitrary threshold above which any value will be trimmed. For example, if your data is height or monthly salary, you say that maximum height is 2m10 or maximum salary 30k, something like that. The bigger the bound you choose, the bigger is the noise you have to add needs to guarantee $\varepsilon$ privacy. – Pohoua Jul 01 '20 at 18:16
  • but is you already have that $ a < x < b$, as said in the question, then no need to trimm. – Pohoua Jul 01 '20 at 18:21