2

I have a data frame that looks like the table below that keeps track of a person visiting a store in a certain month. I want to create a new column, Total_Visits, that is a count of the number of times a certain ID visited the store during a certain month. In the below example, for date 6-13 and ID 23, the Total_Visits would have 3 in any row where date == 6-13, and ID == 23.

Date    ID   
6-13    23   
6-13    34   
6-13    23   
6-13    23   
7-13    23   

Data frame I'm looking for would be

Date    ID    Total_Visits   
6-13    23    3
6-13    34    1
6-13    23    3
6-13    23    3
7-13    23    1

While I assume there is some sort of acast function to ensure that I don't have to loop through this (30,000 rows), I would be OK with a loop if vectorization did not work.

David Arenburg
  • 89,637
  • 17
  • 130
  • 188
dward4
  • 1,317
  • 3
  • 11
  • 29

2 Answers2

3

You can use dplyr package:

library(dplyr)
df %>%  group_by(Date, ID) %>% mutate(Total_Visits = n())

# # A tibble: 5 x 3 
# # Groups:   Date, ID [3] 
#     Date    ID Total_Visits 
#   <fctr> <int>        <int> 
# 1   6-13    23            3 
# 2   6-13    34            1 
# 3   6-13    23            3 
# 4   6-13    23            3 
# 5   7-13    23            1

Use data.frame on the output to make it a dataframe.

Update:

Or using data.table package:

library(data.table)
setDT(df)[, Total_Visits:=.N, by=c("Date","ID")]

df

#    Date ID Total_Visits 
# 1: 6-13 23            3 
# 2: 6-13 34            1 
# 3: 6-13 23            3 
# 4: 6-13 23            3 
# 5: 7-13 23            1

Data:

df <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("6-13", 
      "7-13"), class = "factor"), ID = c(23L, 34L, 23L, 23L, 23L)), .Names = c("Date", 
      "ID"), class = "data.frame", row.names = c(NA, -5L))
Community
  • 1
  • 1
M--
  • 20,766
  • 7
  • 52
  • 87
2

I like using data.table for these operations. It's also the fastest solution.

library(data.table)

dat = read.table("clipboard", header = TRUE)
setDT(dat)
> dat[ , .(visits = .N), by = .(Date, ID)]
   Date ID visits
1: 6-13 23      3
2: 6-13 34      1
3: 7-13 23      1
Kristofersen
  • 2,676
  • 1
  • 12
  • 30