0

I searched this extensively and all the examples I was able to find randomize row order but not the data in the row itself. I am trying to create a dataset where data needs to be randomized.

I'm trying to turn df into df2;

df:

df <- data.frame(a = c(1:5),
                 b = c(LETTERS[1:5]),
                 c = c(letters[1:5]))

  a b c
1 1 A a
2 2 B b
3 3 C c
4 4 D d
5 5 E e

df2


  a b c
1 2 D b
2 1 B d
3 4 E c
4 3 A a
5 5 C e

I think the reason there are not a lot of solutions for this on people need to keep their data intact but in this case I'm trying to sort of brake the dataset itself, so entries are not correct anymore.

Currently all I can achieve is


df2 <- df[sample(1:nrow(df)), ]

  a b c
3 3 C c
4 4 D d
2 2 B b
1 1 A a
5 5 E e

which randomizes the order of the rows but keeps the data intact.

puredata
  • 121
  • 8

1 Answers1

1

You can apply sample to each column of the dataframe.

library(dplyr)
df2 <- df %>% mutate(across(.fns = sample))
#In older version of `dplyr` use `mutate_all`
#df2 <- df %>% mutate_all(sample)

#  a b c
#1 5 C c
#2 3 B e
#3 2 E d
#4 4 D b
#5 1 A a
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178
  • what is the .fns here about? – hachiko Nov 07 '20 at 06:02
  • It is to specify the function that we want to apply to each column. – Ronak Shah Nov 07 '20 at 06:07
  • Thank you, for my use case this works as well. But when I try using this on mtcars dataset, it flattens row names which are car names in this case. Any idea why and how to avoid? Also how can I find more info about .fns usage? – puredata Nov 07 '20 at 08:27
  • Tibbles don't support rownames so if you want to store rowname information you need to add them as separate column and do `mtcars %>% rownames_to_column() %>% mutate(across(.fns = sample))`. For more information about `.fns` see `?across`. – Ronak Shah Nov 07 '20 at 08:32