0

Suppose we have two independent normal distributions $A \sim \mathcal{N}(\mu_a, {\sigma_a}^2)$ and $B \sim \mathcal{N}(\mu_b, {\sigma_b}^2)$.

Suppose that we have a sample $X$, which is drawn from $A$.

Now, suppose that we replace $p\%$ of the values in $X$ with data from $B$, and call the result $X'$.

What will be the change in mean and standard deviation going from $X$ to $X'$?

Tom Hosker
  • 277
  • 2
  • 8
  • 2
    How do you select the $p%$ of values? Do they lie in some predetermined region, or satisfy some equation, or perhaps are they random? And are you asking about expected changes in mean and SD or actual changes in a dataset? – whuber Jun 05 '19 at 15:21
  • If you select the $p$% at random, then you have a mixture distribution. – BruceET Jun 05 '19 at 16:10
  • @BruceET Almost, but not quite: in a sample of $n$ iid observations from a mixture distribution, the proportion of values from $B$ will be Binomial$(n,p)$ rather than exactly $p.$ This will affect the variance. – whuber Jun 05 '19 at 16:21
  • @whuber "How do you select the $p%$ values?" Suppose the data in $X$ comes in the form: [$d_1$, $d_2$, ...]. We chop off the last so many data-points, and replace them with data drawn (in a random fashion) from $B$. The order of the data in $X$ ought not to matter. So I suppose, in effect, we're selecting the data to be replaced at random? – Tom Hosker Jun 06 '19 at 17:03
  • Yes: if the order does not matter, that's tantamount to randomly replacing a fixed number of the values. But are you asking about the expected changes in these random variables or are you asking about the actual changes in mean and SD resulting from the actual changes made to the data? – whuber Jun 06 '19 at 17:17
  • @whuber "And are you asking about expected changes in mean and SD or actual changes in a dataset?" I'd like a confidence interval for the new mean and standard deviation, if that's possible. – Tom Hosker Jun 06 '19 at 17:22
  • @whuber Sorry. I ought to clarify a bit more. I want to know what the mean and SD of $X'$ are, where $X'$ is the modified sample. – Tom Hosker Jun 06 '19 at 17:24
  • Two applications of the general technique described at https://stats.stackexchange.com/a/51927/919 will do it: view the process in terms of three disjoint sets of data: the common set; the set removed; and the set added. – whuber Jun 06 '19 at 18:11

0 Answers0