0

I am looking for a smart way to index subcategories within a dataframe.
I've created a very simple reproducible example below. How would you code the following step to go from input to output (ie how can we code the creation of color_id variable)?

Thank you very much in advance for your view on this!

input <- data.frame(label = c("red", "red", "blue", "green", "green", "green", "orange"), count = c(2, 2, 1, 3, 3 ,3, 1))

enter image description here

output <- data.frame(label = c("red", "red", "blue", "green", "green", "green", "orange"), count = c(2, 2, 1, 3, 3 ,3, 1), color_id = c(1, 2, 1, 1, 2, 3, 1))

enter image description here

Best regards

cho7tom
  • 960
  • 2
  • 11
  • 29
  • 1
    I can't currently find a good dupe for this. In base R you can use `?ave`, for example: `within(input, color_id % group_by(label) %>% mutate(color_id = row_number())` – talat Jun 19 '15 at 09:19
  • @DavidArenburg This is a special case of the one I used, but the answer on you linked does directly answer the question. How can I switch the dupe? – James Jun 19 '15 at 09:35
  • I think `splitstackshape` has a `getanid` function for this. – Pierre L Jun 19 '15 at 09:40

2 Answers2

3

using data.table:

library(data.table)
setDT(input)[ , color_id := seq_len(.N), by = label]
    label count color_id
1:    red     2        1
2:    red     2        2
3:   blue     1        1
4:  green     3        1
5:  green     3        2
6:  green     3        3
7: orange     1        1
grrgrrbla
  • 2,469
  • 2
  • 15
  • 29
0
library(splitstackshape)
getanID(input, 'label')
Pierre L
  • 27,528
  • 5
  • 43
  • 64