7

I've been asked a question - if I calculate some sd, can I change one value and still keep the same sd.

The answer is simply yes. For example:

sd(c(2,3,4))
sd(c(3,4,5))

But what I wondered then is, assuming you change k values, under what rules do you change them so to always keep the same sd (is it true to ask what are the degrees of freedom here?!)

I am not sure how this applies to anything practical, but I imagine there was a good deal of theoretical work on such questions - but I don't know where to even look for them.

Thanks.

whuber
  • 322,774
Tal Galili
  • 21,541
  • Actually, you cannot change one variable and still keep the same standard deviation. In your example, after all, you are not changing one variable but three. – Stephan Kolassa Oct 20 '10 at 20:08
  • @Stephan I understand the question as referring to an univariate distribution, hence one variable with multiple observations. Am I missing something? – chl Oct 20 '10 at 20:34
  • 1
    @chl - I'd say it's more probable that I am missing something. Tal asks about "changing one value", and I should have written about "values", not "variables" in my comment, where "value" to me sounds much like "observation". – Stephan Kolassa Oct 20 '10 at 20:38
  • 1
    @Stephen In the example he changed 2 to 5: only one value was altered. – whuber Oct 20 '10 at 20:47
  • 1
    @Tal I changed "roles" to "rules" in your question and hope that was what you intended. – whuber Oct 20 '10 at 20:57
  • @chl It's univariate all right but I take it that he is asking about the observations, not the variable. – whuber Oct 20 '10 at 20:58
  • @whuber I understood it in the same way (or I think so): we have a series of observations (= observed numerical values for a given variable), is there a way to keep the same SD by altering one or more of these values? – chl Oct 20 '10 at 21:08
  • @chl We take it in the same way. Although it is based on a different interpretation, Srikant's answer is nevertheless an interesting response. As the comments afterwards show, though, it doesn't lead to very interesting solutions: there are too many ways one can change a random variable while preserving its variance. – whuber Oct 20 '10 at 21:13
  • @whuber: 'Interesting' is subjective. I believe that both interpretations are equally valid, interesting in their own right. –  Oct 20 '10 at 21:44
  • Thanks everyone for the replies. I meant "observations" indeed. I'm honored to be able to converse with all of you through here. – Tal Galili Oct 21 '10 at 07:13

2 Answers2

7

The question is about the data, not random variables.

Let $X = (x_1, x_2, \ldots, x_n)$ be the data and $Y = (y_1, y_2, \ldots, y_n)$ be additive changes to the data so that the new values are $(x_1+y_1, \ldots, x_n+y_n)$. From

$$\text{Var}(X) = \text{Var}(X+Y) = \text{Var}(X) + 2 \text{Cov}(X,Y) + \text{Var}(Y)$$

we deduce that

$$(*) \quad \text{Var}(Y) + 2 \text{Cov}(X,Y) = 0$$

is necessary for the variance to be unchanged. Add in $n-k$ additional constraints to zero out all but $k$ of the $y_i$ (there are ${n \choose k}$ ways to do this) and note that all $n-k+1$ constraints almost everywhere have linearly independent derivatives. By the Implicit Function Theorem, this defines a manifold of $n - (n-k+1)$ = $k-1$ dimensions (plus perhaps a few singular points): those are your degrees of freedom.

For example, with $X = (2, 3, 4)$ we compute

$$3 \text{Var}(y) = y_1^2 + y_2^2 + y_3^2 - (y_1+y_2+y_3)^2/3$$

$$3 \text{Cov}(x,y) = (2 y_1 + 3 y_2 + 4 y_3) - 3(y_1 + y_2 + y_3)$$

If we set (arbitrarily) $y_2 = y_3 = 0$ the solutions to $(*)$ are $y_1 = 0$ (giving the original data) and $y_1 = 3$ (the posted solution). If instead we require $y_1=y_3 = 0$ the only solution is $y_2 = 0$: you can't keep the SD constant by changing $y_2$. Similarly we can set $y_3 = -3$ while zeroing the other two values. That exhausts the possibilities for $k=1$. If we set only $y_3 = 0$ (one of the cases where $k = 2$) then we get a set of solutions

$$y_2^2 - y_1 y_2 + y_1^2 - 3y_1 == 0$$

which consists of an ellipse in the $(y_1, y_2)$ plane. Similar sets of solutions arise in the choices $y_2 = 0$ and $y_1 = 0$.

whuber
  • 322,774
  • Amazingly detailed answer Whuber - thank you very much! This obviously leaves open the questions of non additive transformations - but for my curiosity needs - I am satisfied. Thanks again! – Tal Galili Oct 21 '10 at 07:16
  • @Tal I hope you're satisfied, because I can't do better than this: every transformation is additive! Consider replacing x_i by arbitrary z_i; just define y_i = z_i - x_i. – whuber Oct 21 '10 at 14:53
  • @whuber are some of the subscripts mixed up? Eg in " the solutions are $y_1=0$ (giving the original data) and $y_2=3$ (the posted solution)", I think the latter should be $y_1=0$? Similarly, "If instead we require $y_2=0$ the only solution is $y_2=0$" - should the restriction be to require $y_1=y_3=0$? – Silverfish Mar 11 '14 at 03:47
  • 1
    @Silverfish Thank you for pointing that out. I have double-checked the analysis, fixed the exposition (you were correct on all counts), and cleaned up the formatting a little. – whuber Mar 11 '14 at 17:30
2

Suppose that you have a random variable $X$ and you wish to find the set of transformations $Y=f(X)$ such that the standard deviation of $Y$ is the identical to the standard deviation of $X$.

Consider first the set of linear transformations:

$Y = a X + b$ where $a, b$ are constants.

It is clear that:

$Var(Y) = a^2 Var(X)$.

Thus, the only set of linear transformations that preserve standard deviations are linear translations but not scaling by any factor other than -1 (see comment by mbq to this answer). I suspect that non-linear transformations do not preserve standard deviations.

  • 1
    s.d. is shift invariant. – user603 Oct 20 '10 at 19:53
  • Thanks Srikant. Do you have an idea on how to define the transformation I gave in my example? – Tal Galili Oct 20 '10 at 19:54
  • 1
    @Tal Y = X + 1? –  Oct 20 '10 at 19:55
  • 2
    You missed a=-1 ;-) –  Oct 20 '10 at 20:07
  • 2
    The standard deviation will not change if you simply reorder your data, but this does not seem to be too helpful here... – Stephan Kolassa Oct 20 '10 at 20:10
  • @mbq Good point! –  Oct 20 '10 at 20:13
  • 2
    @Srikant: if we expand our search from linear transformations to analytic transformations (i.e., transformations that can be represented by their Taylor series), I think one could prove that all higher order terms need to vanish for the standard deviation to be unchanged. That is, the only analytic transformation that preserves sd would be adding or subtracting a constant to all values. Might be a nice exercise for statistics undergrads ;-) – Stephan Kolassa Oct 20 '10 at 20:14
  • @stephan I was actually thinking of the same idea but it seemed too much tex typing so I abandoned the idea. –  Oct 20 '10 at 20:16
  • 1
    @Stephen Not true for analytic transformations or even the most general transformations. Apply any transformation, such as a Box-Cox transformation. Determine the new variance. If it exists and is nonzero, rescale the result to match the original variance. – whuber Oct 20 '10 at 20:50
  • To Whuber - "rescale the result to match the original variance" - here's a thought, thanks! – Tal Galili Oct 21 '10 at 07:11