Randomizing rows in a dataset using R

Question

I searched this extensively and all the examples I was able to find randomize row order but not the data in the row itself. I am trying to create a dataset where data needs to be randomized.

I'm trying to turn df into df2;

df:

df <- data.frame(a = c(1:5),
                 b = c(LETTERS[1:5]),
                 c = c(letters[1:5]))

  a b c
1 1 A a
2 2 B b
3 3 C c
4 4 D d
5 5 E e

df2


  a b c
1 2 D b
2 1 B d
3 4 E c
4 3 A a
5 5 C e

I think the reason there are not a lot of solutions for this on people need to keep their data intact but in this case I'm trying to sort of brake the dataset itself, so entries are not correct anymore.

Currently all I can achieve is


df2 <- df[sample(1:nrow(df)), ]

  a b c
3 3 C c
4 4 D d
2 2 B b
1 1 A a
5 5 E e

which randomizes the order of the rows but keeps the data intact.

Thank you! It works! Trying to find a way to mark it as correct answer. Could you explain df; df2[] briefly? — puredata, Nov 07 '20 at 05:19
thanks a lot! didn't know about using semicolons like a new line. to refine the use of this; what should I try if I only want to randomize some columns? I tried subsetting on df2 inside the lapply, but it didn't work as expected. — puredata, Nov 07 '20 at 07:59
It's list subsetting, so subset the dataframe as you would a list, with a single set of indices for columns, e.g. `iris[1:4] — alistaire, Nov 07 '20 at 09:04

score 1 · Answer 1 · answered Nov 07 '20 at 05:16

1

You can apply sample to each column of the dataframe.

library(dplyr)
df2 <- df %>% mutate(across(.fns = sample))
#In older version of `dplyr` use `mutate_all`
#df2 <- df %>% mutate_all(sample)

#  a b c
#1 5 C c
#2 3 B e
#3 2 E d
#4 4 D b
#5 1 A a

answered Nov 07 '20 at 05:16

Ronak Shah

355,584
18
123
178

what is the .fns here about? – hachiko Nov 07 '20 at 06:02
It is to specify the function that we want to apply to each column. – Ronak Shah Nov 07 '20 at 06:07
Thank you, for my use case this works as well. But when I try using this on mtcars dataset, it flattens row names which are car names in this case. Any idea why and how to avoid? Also how can I find more info about .fns usage? – puredata Nov 07 '20 at 08:27
Tibbles don't support rownames so if you want to store rowname information you need to add them as separate column and do `mtcars %>% rownames_to_column() %>% mutate(across(.fns = sample))`. For more information about `.fns` see `?across`. – Ronak Shah Nov 07 '20 at 08:32

Randomizing rows in a dataset using R

1 Answers1