-1

I'm still new to R and I could use some help. So I have a dataset that looks something like this

a <- c("a", "b", "c", "d", "a", "d") 
E <- c(NA, "E", NA, "E", NA, "E")
F <- c(NA, "F", "F", "F", NA, NA)
G <- c("G", NA, "G", "G", "G", NA)

df <- data.frame (a, E, F, G)

I'm trying to find out which one of E, F, or G, occurs most per group when I group by a. My biggest issue seems to be that they are characters in three separate columns. I tried combining them into one column but it didn't work. I'm struggling to find answers after searching for hours and am now just confused at what should be an easy question I would think. Any help would be amazing. Thanks!

Edit: Sorry I'm very new to the site and am still getting the formatting down. So the correct output would ideally be something like.

  a   Mostcommon
  -   ----------
1  a     "G"
2  b    "E""F"
3  c    "F""G"
4  d     "E"

Using the example I gave. With my actual data there should only be one most common value per group.

Clara W
  • 3
  • 2

2 Answers2

0

Is this what you'd like to do?

library(tidyverse)

tibble(
  a = c("a", "b", "c", "d", "a", "d"),
  E = c("NA", "E", "NA", "E", "NA", "E"),
  F = c("NA", "F", "F", "F", "NA", "NA"),
  G = c("G", "NA", "G", "G", "G", "NA")
) |> 
  mutate(across(E:G, ~if_else(is.na(.), 0, 1))) |> 
  group_by(a) |> 
  summarise(across(E:G, sum))
#> # A tibble: 4 × 4
#>   a         E     F     G
#>   <chr> <dbl> <dbl> <dbl>
#> 1 a         0     0     2
#> 2 b         1     1     0
#> 3 c         0     1     1
#> 4 d         2     1     1

Created on 2022-05-03 by the reprex package (v2.0.1)

Gregor Thomas
  • 119,032
  • 17
  • 152
  • 277
Carl
  • 2,132
  • 1
  • 9
  • 21
  • I got this to work! I had to replace my NA values with 0 character values but then it worked great. I can total make this work for what I need. Thank you! – Clara W May 03 '22 at 15:38
  • Changed `== "NA"` to `is.na()` now that the question has been updated. – Gregor Thomas May 03 '22 at 15:51
0

You could use the Modes function defined here. ie I copy oasted it over here

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

Now with the modes function, do the following:

df %>%
  pivot_longer(-a, values_drop_na = TRUE)%>%
  group_by(a) %>%
  summarize(most_common = toString(Modes(value)))

# A tibble: 4 x 2
  a     most_common
  <chr> <chr>      
1 a     G          
2 b     E, F       
3 c     F, G       
4 d     E         
onyambu
  • 49,350
  • 3
  • 19
  • 45