Selecting at random specific values from data frame

Question

I have a data frame consisting of an ID column a clones column and 'Isolate' column.

Each ID is present multiple times within the ID column and are associated with different clones in the clone column named as clone 1, clone 2 clone 3 etc which come from distinct isolates. Each ID may have the same clone multiple times too

e.g.

ID  clones  Isolate
ID1 clone1    1
ID1 clone1    2 
ID1 clone1    3 
ID2 clone1    4
ID2 clone1    5
ID2 clone2    6
ID2 clone2    7
ID3 clone1    8
ID3 clone1    9
ID3 clone2    10
ID3 clone3    11
ID3 clone3    12

I want to select at random for each unique ID one representative of each clone.

I expect to get an output like this:

ID  clones   Isolate
ID1 clone1      2
ID2 clone1      5
ID2 clone2      6
ID3 clone1      8
ID3 clone2     10
ID3 clone3     12

with a representative clone for each ID chosen at random, so random isolate column

I'm not sure if I understand you correctly. Is `dplyr::distinct()` what you want? — yusuzech, Jul 24 '19 at 19:48
So in my example ID3 in the ID column has two clone1, one clone 2, and two clone 3. I want a representative for each clone of each ID selected for at random, so I want to select one clone 1, one clone 2 and one clone 3 at random for ID3. I want to do this for every distinct ID — Siobhan Phoebe Natalia Ingram, Jul 24 '19 at 19:51
sorry I wasn't clear, my data frame has other columns as well which are all different in each row so using unique doesnt work, I will edit the question to show this — Siobhan Phoebe Natalia Ingram, Jul 24 '19 at 19:55

score 0 · Accepted Answer · answered Jul 24 '19 at 20:01

0

It seems like you can use the results of a similar question asked just now: How to use R to identify twins, and then randomly select and remove one?

If you use dplyr's group_by function, for ID and clone, and sample_n(1) of those, you should get only one rep for each ID and clone pair. Borrowing from @Andrew Gustar's answer:

library(dplyr)

df %>% 
  group_by(ID, clones) %>% 
  sample_n(1)

answered Jul 24 '19 at 20:01

Mike

28
4

If you think this is a duplicate of an existing question, it's better to flag it as such instead of adding a duplicate answer – camille Jul 24 '19 at 21:10

Selecting at random specific values from data frame

1 Answers1