0

I can't figure out how to calculate the mean for a subset of a column in R. My particular question is calculating "expenditures" for "age" 40+ and <40. I've tried

mean(expenditures[["age">=40]]) 

and gotten success, but

mean(expenditures[["age"<40]]) 

was not successful.

I am therefore stuck on this problem. I'll greatly appreciate any help on this seemingly simple question.

Ronak Shah
  • 355,584
  • 18
  • 123
  • 178

2 Answers2

2

You could do it in one hit by mutating a group column, group_by() that column and use summarise() to calculate the mean:

library(dplyr)

data("mtcars")

mtcars %>%
  group_by(group = ifelse(hp > 100, "> 100", "<= 100")) %>%
  summarise(mean = mean(hp))

gives:

# A tibble: 2 x 2
  group   mean
  <chr>  <dbl>
1 <= 100  76.3
2 > 100   174.

Note: Thanks Tino for the tips!

Paul
  • 2,715
  • 1
  • 9
  • 27
1

If you don't want to use additional packages:

# some sample data:
set.seed(123)
df <- data.frame(age = sample(x = 20:50, size = 100, replace = TRUE),
                 expenditures = runif(n = 100, min = 100, max = 1000))

aggregate(
  formula = expenditures ~ age >= 40,
  data = df,
  FUN = mean
)

And to add to Paul's solution, you could also create the group within group_by:

library(dplyr)
# using dplyr:
df %>% 
  group_by(age >= 40) %>% 
  summarise_at(.vars = vars(expenditures), mean)
Tino
  • 2,013
  • 12
  • 15