-5

I want to split my data into 3 parts with the ratio of 6:2:2. Is there a R command that can do that? Thanks.

I used createDataPartition in the caret package, that can split data into two parts. But how to do it with 3 splits? Is that possible? Or I need two steps to do that?

Mark Miller
  • 11,789
  • 22
  • 72
  • 126
user697911
  • 9,271
  • 20
  • 83
  • 153

1 Answers1

1

You randomly split with (roughly) this ratio using sample:

set.seed(144)
spl <- split(iris, sample(c(1, 1, 1, 2, 3), nrow(iris), replace=T))

This split your initial data frame into a list. Now you can check that you've gotten the split ratio you were looking for using lapply with nrow called on each element of your list:

unlist(lapply(spl, nrow))
#  1  2  3 
# 98 26 26

If you wanted to randomly shuffle but to get exactly your ratio for each group, you could shuffle the indices and then select the correct number of each type of index from the shuffled list. For iris, we would want 90 for group 1, 30 for group 2, and 30 for group 3:

set.seed(144)
nums <- c(90, 30, 30)
assignments <- rep(NA, nrow(iris))
assignments[sample(nrow(iris))] <- rep(c(1, 2, 3), nums)
spl2 <- split(iris, assignments)
unlist(lapply(spl2, nrow))
#  1  2  3 
# 90 30 30 
josliber
  • 43,000
  • 12
  • 95
  • 132