0

How can I collapse data in a wide format (see example below), into a concatenated column showing only the TRUE values? I want to end up with a data table in the format Employee Name | "string of applicable column headers" as illustrated in demoOUT.

library(data.table)
demoIN <- data.table(
  Name=c("Mike Jones","Bobby Fisher"),
  A=c(1,0),
  B=c(1,1),
  C=c(0,0),
  D=c(1,1))

           Name A B C D
1:   Mike Jones 1 1 0 1
2: Bobby Fisher 0 1 0 1

demoOUT <- data.table(
  Name=c("Mike Jones","Bobby Fisher"),
  Cases =c("A,B,D","B,D"))

           Name Cases
1:   Mike Jones A,B,D
2: Bobby Fisher   B,D
Mako212
  • 6,318
  • 1
  • 16
  • 32

3 Answers3

2

A solution uses functions from dplyr and tidyr. demoIN2 is the final output.

library(dplyr)
library(tidyr)

demoIN2 <- demoIN %>%
  gather(Cases, Value, -Name) %>%
  filter(Value == 1) %>%
  group_by(Name) %>%
  summarise(Cases = paste(Cases, collapse = ","))
www
  • 37,164
  • 12
  • 37
  • 72
2

Here is a base R solution if you were interested.

demoIN$Cases <- apply(demoIN[, -c("Name")], 1, function(x) paste(na.omit(ifelse(x == 1, names(x), NA)), collapse = ","))

demoIN <- demoIN[,c("Name","Cases")]
Matt Jewett
  • 3,024
  • 1
  • 12
  • 20
  • Nice. Interestly much slower than `dplyr/tidyr`, just tried with 1.2M rows and took 43.68 seconds, compared to 2.61 seconds with `dplyr/tidyr` – Mako212 Jul 14 '17 at 18:39
1

Here is an option using data.table (as the initial object is data.table

library(data.table)
melt(demoIN, id.var = 'Name')[value==1, .(Cases = paste(variable, collapse=',')), Name]
#           Name Cases
#1:   Mike Jones A,B,D
#2: Bobby Fisher   B,D
akrun
  • 789,025
  • 32
  • 460
  • 575