0

I want to Merge certain raw values of data frame into one data frame.

If raw data as shown below,

> set.seed(1)
> Data1 <- data.frame(
+     Value = sample(1:5),
+     Report = sample(c("yes", "no"), 5, replace = TRUE)
+ )
> Data2 <- data.frame(
+     Value = sample(1:5),
+     Report = sample(c("yes", "no"), 5, replace = TRUE)
+ )
> Data3 <- data.frame(
+     Value = sample(1:5),
+     Report = sample(c("yes", "no"), 5, replace = TRUE)
+ )
> 
> subset(Data1)
  Value Report
1     2     no
2     5     no
3     4     no
4     3     no
5     1    yes
> subset(Data2)
  Value Report
1     2    yes
2     1     no
3     3     no
4     4    yes
5     5     no
> subset(Data3)
  Value Report
1     5    yes
2     1    yes
3     2    yes
4     4     no
5     3    yes

I want to make the data frame as follows.

collect rows with 'Yes' in the 'Report' column and save them to each data frame.

dfResult1
Value  Report
1      yes
2      yes

dfResult2
value  report
4      Yes 

dfResult3
value  report
5      Yes
1      Yes
2      Yes

dfResult4
value  report
3      Yes

1 Answers1

0

Here is one way to do it in base R, rbind all the dataframes together, create a row number column, filter the rows with only "yes" values and create a grouping column to split.

df <- rbind(Data1, Data2, Data3)
df$row <- seq_len(nrow(df))
df <- df[df$Report == "yes",]
list_df <- split(df[-3], cumsum(c(TRUE, diff(df$row) > 1)))
list_df

#$`1`
#  Value Report
#5     1    yes
#6     2    yes

#$`2`
#  Value Report
#9     4    yes

#$`3`
#   Value Report
#11     5    yes
#12     1    yes
#13     2    yes

#$`4`
#   Value Report
#15     3    yes

It will return a list of dataframes which can be accessed by using list_df[[1]], list_df[[2]] and so on.

Using same logic, we can use dplyr

library(dplyr)
df %>%
  mutate(row = row_number()) %>%
  filter(Report == "yes") %>%
  group_split(cumsum(c(TRUE, diff(row) > 1)), keep = FALSE)

We can also use rle

df$group <- with(rle(df$Report), rep(seq_along(values), lengths))
df1 <- subset(df, Report == "yes")
split(df1, df1$group)

and using data.table::rleid

df %>%
  mutate(group = data.table::rleid(Report)) %>%
  filter(Report == "yes") %>%
  group_split(group, keep = FALSE)
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178