17

I'm trying to tighten up a %>% piped workflow where I need to apply the same function to several columns but with one argument changed each time. I feel like purrr's map or invoke functions should help, but I can't wrap my head around it.

My data frame has columns for life expectancy, poverty rate, and median household income. I can pass all these column names to vars in mutate_at, use round as the function to apply to each, and optionally supply a digits argument. But I can't figure out a way, if one exists, to pass different values for digits associated with each column. I'd like life expectancy rounded to 1 digit, poverty rounded to 2, and income rounded to 0.

I can call mutate on each column, but given that I might have more columns all receiving the same function with only an additional argument changed, I'd like something more concise.

library(tidyverse)

df <- tibble::tribble(
        ~name, ~life_expectancy,          ~poverty, ~household_income,
  "New Haven", 78.0580437642378, 0.264221051111753,  42588.7592521085
  )

In my imagination, I could do something like this:

df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), 
            round, digits = c(1, 2, 0))

But get the error

Error in mutate_impl(.data, dots) : Column life_expectancy must be length 1 (the number of rows), not 3

Using mutate_at instead of mutate just to have the same syntax as in my ideal case:

df %>%
  mutate_at(vars(life_expectancy), round, digits = 1) %>%
  mutate_at(vars(poverty), round, digits = 2) %>%
  mutate_at(vars(household_income), round, digits = 0)
#> # A tibble: 1 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589

Mapping over the digits uses each of the digits options for each column, not by position, giving me 3 rows each rounded to a different number of digits.

df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), 
            function(x) map(x, round, digits = c(1, 2, 0))) %>%
  unnest()
#> # A tibble: 3 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.3            42589.
#> 2 New Haven            78.1    0.26           42589.
#> 3 New Haven            78      0              42589

Created on 2018-11-13 by the reprex package (v0.2.1)

camille
  • 15,634
  • 17
  • 33
  • 53
  • 1
    In the past when faced with this problem I ended up gathering my columns, grouping them, mutating them, and spreading them back out. See also [How do I sweep specific columns with dplyr?](https://stackoverflow.com/q/28298688/1968) – Konrad Rudolph Nov 13 '18 at 19:38
  • @KonradRudolph thanks, I was thinking about that too, and that's an approach I've used before, but I'm trying to figure out whether a super simple, one-line version is possible – camille Nov 13 '18 at 19:50
  • @Henrik you might be on to something. Using `map2_dfc` could work, but that requires dropping the `name` column and then maybe joining it back on. I'm trying to imagine a `map2_dfc` / `map_at` hybrid – camille Nov 13 '18 at 19:56
  • 1
    Seems like it might be easier when you will be able to pass a list of functions to summarize_at/mutate_at: https://github.com/tidyverse/dplyr/issues/3433. That doesn't seem to work yet. – MrFlick Nov 13 '18 at 20:06
  • `mutate` supports `!!!` so the easiest in my opinion is to recreate the verbose `mutate` call (not `mutate_at`) programmatically through `map2` or (cleaner to me) `imap` – moodymudskipper Nov 14 '18 at 14:21

3 Answers3

12

2 solutions


mutate with !!!

invoke was a good idea but you need it less now that most tidyverse functions support the !!! operator, here's what you can do :

digits <- c(life_expectancy = 1, poverty = 2, household_income = 0)  
df %>% mutate(!!!imap(digits, ~round(..3[[.y]], .x),.))
# # A tibble: 1 x 4
#          name life_expectancy poverty household_income
#         <chr>           <dbl>   <dbl>            <dbl>
#   1 New Haven            78.1    0.26            42589

..3 is the initial data frame, passed to the function as a third argument, through the dot at the end of the call.

Written more explicitly :

df %>% mutate(!!!imap(
  digits, 
  function(digit, name, data) round(data[[name]], digit),
  data = .))

If you need to start from your old interface (though the one I propose will be more flexible), first do:

digits <- setNames(c(1, 2, 0), c("life_expectancy", "poverty", "household_income"))

mutate_at and <<-

Here we bend a bit the good practice of avoiding <<- whenever possible, but readability matters and this one is really easy to read.

digits <- c(1, 2, 0)
i <- 0
df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), ~round(., digits[i<<- i+1]))
# A tibble: 1 x 4
#     name      life_expectancy poverty household_income
#     <chr>               <dbl>   <dbl>            <dbl>
#   1 New Haven            78.1    0.26            42589

(or just df %>% mutate_at(names(digits), ~round(., digits[i<<- i+1])) if you use a named vector as in my first solution)

moodymudskipper
  • 42,696
  • 10
  • 102
  • 146
  • This is the correct way to do it. I've deleted my answer because while the output in the console matched OPs result, running `apply(df, 1, print)` showed that the values were each rounded to two decimals. – Mako212 Nov 14 '18 at 17:27
  • 1
    This is wild! So `imap` is mapping over `digits` and its names, then applying the `round` function, but also taking the original data frame in `...`? Am I getting that right? – camille Nov 14 '18 at 20:04
  • 1
    Also, I appreciate the GH comment! – camille Nov 14 '18 at 20:05
  • Yes you got it perfectly, passing the `lhs` to the `...` is a trick I like a lot, I added a more explicit version for clarity. – moodymudskipper Nov 14 '18 at 20:11
2

Here's a map2 solution along the lines of Henrik's comment. You can then wrap this inside a custom function. I provided an rough first attempt but I have done minimal tests, so it probably breaks under all sorts of situations if evaluation is strange. It also doesn't use tidyselect for .at, but neither does modify_at...

library(tidyverse)

df <- tibble::tribble(
  ~name, ~life_expectancy,          ~poverty, ~household_income,
  "New Haven", 78.0580437642378, 0.264221051111753,  42588.7592521085,
  "New York", 12.349685329, 0.324067934, 32156.230974623
)

rounded <- df %>%
  select(life_expectancy, poverty, household_income) %>%
  map2_dfc(
    .y = c(1, 2, 0),
    .f = ~ round(.x, digits = .y)
  )
df %>%
  select(-life_expectancy, -poverty, -household_income) %>%
  bind_cols(rounded)
#> # A tibble: 2 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589
#> 2 New York             12.3    0.32            32156


modify2_at <- function(.x, .y, .at, .f) {
  modified <- .x[.at] %>%
    map2(.y, .f)
  .x[.at] <- modified
  return(.x)
}

df %>%
  modify2_at(
    .y = c(1, 2, 0),
    .at = c("life_expectancy", "poverty", "household_income"),
    .f = ~ round(.x, digits = .y)
  )
#> # A tibble: 2 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589
#> 2 New York             12.3    0.32            32156

Created on 2018-11-13 by the reprex package (v0.2.1)

Calum You
  • 13,710
  • 3
  • 20
  • 39
2

Fun with tidyeval:

prepared_pairs <- 
  map2(
    set_names(syms(list("life_expectancy", "poverty", "household_income"))),
    c(1, 2, 0), 
    ~expr(round(!!.x, digits = !!.y))
  )

mutate(df, !!! prepared_pairs)

# # A tibble: 1 x 4
#   name      life_expectancy poverty household_income
#   <chr>               <dbl>   <dbl>            <dbl>
# 1 New Haven            78.1    0.26            42589
Aurèle
  • 11,334
  • 1
  • 29
  • 47
  • Interesting. Using `expr` in this way for the entire expression is comparable to using `enquo` on individual variables? I'm still getting the hang of the different tidyeval verbs – camille Jan 31 '19 at 16:04
  • (Prefixing everything I say with "As far as I understand"): `expr` is a little more "bare" in the sense that it doesn't carry an environment with it. `expr` is like the lighter `quo` (not `enquo`) without an environment – Aurèle Jan 31 '19 at 16:09
  • I think `expr` is just `quote` except that it understands `!!` – moodymudskipper Feb 01 '19 at 10:25
  • 1
    It's a cool solution, if you use the definition of `digits` that I use it's a bit simpler to read as you can do : `prepared_pairs – moodymudskipper Feb 01 '19 at 10:38
  • 1
    Thanks! An idea to make yours robust to grouped data frames is to wrap it in `do` like `df %>% do(mutate(., !!!imap(digits, ~round(..3[[.y]], .x),.)))` – Aurèle Feb 01 '19 at 15:08