2

I have a dataset with three column

  Date1      StudentId  Status
  08/04/2014 155261     Yes
  08/04/2014 155261     No
  08/25/2014 236991     Yes
  08/27/2014 236991     Yes
  08/29/2014 236991     Yes

I am trying to aggregate the Status by Id and Date1, such that the final dataset will be like this

  Date1      StudentId  Response
  08/04/2014 155261     Yes, No
  08/25/2014 236991     Yes
  08/27/2014 236991     Yes
  08/29/2014 236991     Yes

I tried using the gsub function but it didnt work, it only aggregates based on StudentId and skips the Date, any help on this issue is much appreciated.

 dataset1[,Response:=gsub("(, )+$","",c(paste(Status,collapse=", "),rep("",.N-1))),by=c("StudentId ","Date1")]
Ezra Polson
  • 235
  • 3
  • 13
  • Are you using a `data.table` (required for use of `:=`)? What does it mean that it didn't work here? – Frank May 21 '15 at 20:16
  • @Frank, it was only aggregating based on StudentId and unable to combine both StudentId and Date, bgoldst solution fixed the problem – Ezra Polson May 21 '15 at 20:30
  • 1
    @Ezra if you already have a `data.table` object, using base R aggregating operations is like riding a bike when you own a Jaguar (unless you are doing it for the sports) – David Arenburg May 21 '15 at 20:39

2 Answers2

3
df <- data.frame(Date1=c('08/04/2014','08/04/2014','08/25/2014','08/27/2014','08/29/2014'), StudentId=c(155261,155261,236991,236991,236991), Status=c('Yes','No','Yes','Yes','Yes') );
aggregate(Status~Date1+StudentId,df,paste,collapse=', ');
##        Date1 StudentId  Status
## 1 08/04/2014    155261 Yes, No
## 2 08/25/2014    236991     Yes
## 3 08/27/2014    236991     Yes
## 4 08/29/2014    236991     Yes

You can rename the column from Status to Response separately:

names(df)[names(df)=='Status'] <- 'Response';
bgoldst
  • 32,336
  • 5
  • 36
  • 61
3

No gsub is needed to concatenate strings (thanks to @DavidArenburg for a simplifcation):

DT1 <- DT[,list(Response=toString(Status)),by=list(Date1,StudentId)]

If students might show up multiple times with the same status, you'll want to use unique on Status.


Instead of using a string, you could store the values in a list with list(String).

  • The good: Set operations like %in% can be used and may be more intuitive than string parsing.
  • The bad: List columns cannot be used in by operations and are generally cumbersome.
Frank
  • 65,012
  • 8
  • 95
  • 173