1

I am trying to use function the boot function (in the R package, boot) and we want to change how many observations are resampled each iteration of the bootstrap.

If it's not possible to change the number of observations in the pseudo-sample, how many observations are used when resampling?

Glen_b
  • 282,281
Young
  • 95
  • 2
  • 11
  • Thanks, @Glen_b It seems that I misunderstand a concept of bootstrapping.... I thought "R" is the number of bootstrapping (i.e. the number of times the samples are taken, not the number of samples that are taken.). If R is the number of samples that are taken, how do you set up the number of times the samples are taken? – Young Aug 13 '13 at 23:34
  • 1
    I have no idea what you're asking now. Perhaps you should say what you think the bootstrap is. I highly recommend reading the book that the boot package goes with. Are you confusing the terms 'observation' (a data point for a single observational unit, like a person - possibly with observations on several variables) with 'sample' (a collection of such observations)? Please note that when you resample with the bootstrap you take your pseudo-samples of size $n$; in the notation of the function, you take R such samples. You don't need to specify $n$ because it can see how big the original is. – Glen_b Aug 13 '13 at 23:36
  • Is your real question actually just 'How does bootstrapping work?' rather than 'How do I use the arguments of this function in R?' --- To hopefully speed this up I'm going to make a guess what you mean and edit your question to one that makes sense, and if I am wrong, please revert it (it takes two mouse clicks to revert). – Glen_b Aug 13 '13 at 23:41
  • You might find the explanation in the fourth paragraph here of some value. – Glen_b Aug 13 '13 at 23:51
  • @Glen_b Yes, I need to know both "how does it work" and "how do you the arguments?". I wanted to know how robust my data set is in terms of correlations. I randomly took 65% of the data and bootstrapped it to see if I still get the same or similar correlations coefficients. I think now I know a little better about bootstrapping thanks to you. So this argument "R" is the number of samples to take. If the number of data points in actual data set is only 200 and R is 1000, "boot" take data point randomly 1000 times from 200 data points. I hope this is correct. – Young Aug 13 '13 at 23:59
  • Yes, that's what bootstrapping does. What you described at the start of that most recent comment sounds more like an attempt at cross validation – Glen_b Aug 14 '13 at 00:02
  • Young, I suggest you explicitly modify your question to relate directly to what you're doing ... such as "How would I bootstrap a correlation?" or even asking about the statistical problem you're trying to solve (which appears to be one about the sampling properties of correlations) and then at the end of your general question, add some specific parts asking about doing it using boot. – Glen_b Aug 14 '13 at 00:04
  • Thanks Glen, I guess I should make a new question rather than modifying it as you made it very clear what bootstrapping does above and below. Thanks for editing. – Young Aug 14 '13 at 00:42
  • However, it would also serve as a basic answer to the suggested question. If you're satisfied with this one as an answer to your original one, asking a new question is reasonable. – Glen_b Aug 14 '13 at 00:49

1 Answers1

2

Imagine I have a set of ten observations* (the original data has more digits):

      x    y
1  1.66 3.64
2  5.30 4.91
3  4.75 5.32
4  2.07 1.58
5  2.88 4.25
6  3.53 4.59
7  1.75 2.37
8  1.42 2.10
9  2.82 4.35
10 1.81 3.90

and I want to bootstrap the correlation to try to assess how stable (or how uncertain or how 'variable') the sample correlation is.

*(ten is much too few for the bootstrap to be much use, but this is just for illustration)

The idea is to resample the data - by sampling rows (with replacement) from the original data to obtain new pseudo-samples of size 10.

This is very easy with the boot package - most of the work is in writing a function for 'boot' to call.

With the above data in mydata:

> print(head(mydata,3),digits=3)
     x    y
1 1.66 3.64
2 5.30 4.91
3 4.75 5.32
...

you can do this in R as follows (following the help on boot):

mycor <- function(x,ind) cor(x[ind,])[1,2]  # this is how 'boot' needs it to work
bcor <- boot(mydata,mycor,R=999)
str(bcor) # to look at what you get back
hist(bcor$t,n=100)

abline(v=bcor$t0,col=6)

producing:

histogram of bootstrapped correlation

The magenta line is the original sample correlation.

Glen_b
  • 282,281