8

I'm in a work group that analyzes medical data. Unfortunately there's a lot of distrust if measured data gets to a competitor or is manipulated.

So I was wondering if there would be a way to "watermark" the measured data before it leaves the house in a way that would not affect some chosen statistics. Searching for it found mostly commercial solutions for marking audio or video - which isn't applicable for us.

whuber
  • 322,774
bdecaf
  • 425
  • 2
    Can you elaborate a little more on what your concerns are? For example, why are you looking at watermarking versus, e.g., calculating and storing robust checksums of the data? – cardinal Aug 10 '12 at 12:22
  • Basically our data regularly gets to out of house statisticians (university - so student's and many other people get their hands on). Some of them also work with our competitors. At the moment it's unreproducible for us if they'd hand our data to other people. It would help the distrust a lot if we could just confirm they don't give the data away. But obviously the data mustn't be distorted so can do their work. – bdecaf Aug 12 '12 at 11:27

1 Answers1

12

The standard method is to put it in the least significant bits or digits; you may for instance calculate the sum of the digits get modulo 10 and append this to the end of the number, decreasing the last digit by one if this sum is larger than 5 to make all statistics almost intact, like this:

294.090842 -> sum of digits is 38, thus mark is 8 and we add it like this: 294.0908418
294.121120 -> sum of digits is 22, thus mark is 2 and we add it like this: 294.1211202
 ...

This trace is hard to notice (unless you store data in a proper way, i.e. with accuracy encoded as the number of significant digits), visible even in subset of the data and almost impossible to appear at random.
Personalized mark can be done by using user-specific salt and some better check sum algorithm.

However, note that this mark will be visible only in the raw data and your competitors may equally easily remove it by adding a small noise or rounding numbers.