14

I want to summarize a dataframe with dplyr, like so:

> test <-data.frame(ID = c("A", "A", "B", "B"), val = c(1:4))
> test %>% group_by(ID) %>% summarize(av = mean(val))
# A tibble: 2 x 2
      ID    av
  <fctr> <dbl>
1      A   1.5
2      B   3.5

But suppose that instead of grouping by the column called "ID" I wish to group by the first column, regardless of its name. Is there a simple way to do that?

I've tried a few naive approaches (group_by(1), group_by(.[1]), group_by(., .[1]), group_by(names(.)[1]) to no avail. I'm only just beginning to use tidyverse packages so I may be missing something obvious.

This question is very similar, but it's about mutate and I wasn't able to generalize it to my problem. This question is also similar, but the accepted answer is to use a different package, and I'm trying to stick with dplyr.

Joe
  • 3,499
  • 2
  • 27
  • 42

4 Answers4

21

You can use one of the scoped variants (group_by_at) for this:

test %>% group_by_at(1) %>% summarise(av = mean(val))

# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5
Psidom
  • 195,464
  • 25
  • 298
  • 322
8

You can use the across functionality as of version 1.0.0:

library(dplyr)
test %>% 
  group_by(across(1)) %>% 
  summarise(av = mean(val))
## A tibble: 2 x 2
#  ID       av
#  <fct> <dbl>
#1 A       1.5
#2 B       3.5
Ian Campbell
  • 21,281
  • 13
  • 26
  • 51
4

In older versions of dpylyr, You could use standard evaluation with dplyr::group_by_:

test %>% 
 group_by_(names(.)[1]) %>% 
 summarize(av = mean(val))
## A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5
Gregor Thomas
  • 119,032
  • 17
  • 152
  • 277
LyzandeR
  • 35,731
  • 12
  • 70
  • 82
3

If we need to use NSE, then sym and !! can be used

test %>%
     group_by(!! rlang::sym(names(.)[1])) %>%
     summarise(av = mean(val))
# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5

We can also create a function. If we pass quoted strings, then we use sym with !! or else go for the enquo/!! route

f1 <- function(dat, grp, valueCol) {
     dat %>%
        group_by(!! rlang::sym(grp)) %>%
        summarise(av = mean(!! rlang::sym(valueCol)))
}

f1(test, "ID", "val")
# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5
akrun
  • 789,025
  • 32
  • 460
  • 575
  • 1
    This is more complicated than I actually need for my problem, but it started me down a fruitful learning path about evaluation and [programming with dplyr](http://dplyr.tidyverse.org/articles/programming.html) – Joe Sep 27 '17 at 15:48