2

I got two data frames like this:

dat1
  col   n
1  A    1
2  B    1
3  C    2


dat2
  col   n
1  A    2
2  B    1
3  C    1
4  D    1

and I want to make a data frame like this with dat1 and dat2:

dat3
  col   n
1  A    3
2  B    2
3  C    3
4  D    1

I'm trying to make data frame (dat3) with dplyr bind_rows, group_by and count, but I can't.

bind_rows(dat1, dat2) %>%
  group_by(col)

result:
  col   n 
1  A    1
2  B    1
3  C    2
4  A    2
5  B    1
6  C    1
7  D    1

bind_rows(dat1, dat2) %>%
  group_by(col) %>%
  count(n)

result:
  col   n   nn
1  A    1    1
2  A    2    1
3  B    1    2
4  C    1    1
5  C    2    1
6  D    1    1

How can I make dat3?

KNOCK
  • 33
  • 5
  • 황낙주, if one of the answers addresses your question, please [accept it](https://stackoverflow.com/help/someone-answers); doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer, you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.) – r2evans Dec 19 '19 at 19:47

4 Answers4

1

You should summarise instead of counting:

bind_rows(dat1, dat2) %>%
  group_by(col) %>% summarise(Sum = sum(n))

# A tibble: 4 x 2
  col     Sum
  <chr> <dbl>
1 A         3
2 B         2
3 C         3
4 D         1
dc37
  • 15,105
  • 3
  • 13
  • 29
1

Third option, just in case:

psum <- function(..., na.rm = TRUE) {
  m <- cbind(...)
  apply(m, 1, sum, na.rm = na.rm)
}

full_join(dat1, dat2, by = "col") %>%
  mutate(n = psum(n.x, n.y))
#   col n.x n.y n
# 1   A   1   2 3
# 2   B   1   1 2
# 3   C   2   1 3
# 4   D  NA   1 1

(n.x and n.y columns are generated by the join due to same-named columns, they are retained here solely for demonstration. Yes, psum is a hack here, likely something better out there ...)

r2evans
  • 108,754
  • 5
  • 72
  • 122
  • i tried full_join and gather, too. it easily solved with unusing 'n'. thanks for your answer! – KNOCK Dec 08 '19 at 16:44
1

Or in base R,

aggregate(cbind(Sum = n) ~ col, rbind(df1, df2), FUN = sum)
#   col Sum
#1   A   3
#2   B   2
#3   C   3
#4   D   1

data

df1 <- structure(list(col = c("A", "B", "C"), n = c(1L, 1L, 2L)), 
    class = "data.frame", row.names = c("1", 
"2", "3"))

df2 <- structure(list(col = c("A", "B", "C", "D"), n = c(2L, 1L, 1L, 
1L)), class = "data.frame", row.names = c("1", "2", "3", "4"))
akrun
  • 789,025
  • 32
  • 460
  • 575
0

data.table is a superior package to dplyr. I suggest you try it:

library(data.table)
dat1 <- setDT(dat1); dat2 <- setDT(dat2)

dat3 <- rbindlist(list(dat1, dat2))[, .(n= sum(n)), .(col)]
Alex W
  • 4,619
  • 4
  • 28
  • 55
  • 1
    Just a naive question, why `data.table` is superior to `dplyr` ? – dc37 Dec 08 '19 at 15:10
  • 4
    Your "superior" reference is completely contextual, and subject to a slew of opinion, experience, needs, etc. Not all comparison factors are based on time-to-compute. One step further: while I am getting more proficient with `data.table`, its readability -- especially for new R users -- can be daunting. Considering that this user seems to be *just starting* with `dplyr`, let's just stick with what they are "familiar" with. – r2evans Dec 08 '19 at 15:10
  • 1
    @r2evans I agree with you second point. But the superiority of data.table to dplyr is so well documented at this point that it is, IMO, reasonable to consider it a fact and not an opinion – Alex W Dec 08 '19 at 15:16
  • 1
    @dc37 https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping http://dirk.eddelbuettel.com/blog/2018/01/21/ https://github.com/matloff/TidyverseSkeptic – Alex W Dec 08 '19 at 15:18
  • 2
    Again, superiority is relative. If you mean *faster*, yes. If you mean *memory-efficient*, certainly. (And I agree whole-heartedly on both counts.) But it has also been argued many times that the conciseness of it is both a strength and a weakness, and please acknowledge that its syntax is enough at odds with base R (and other packages) to be confusing *to new R users*. It is the right tool for a lot of problems, but it is not the perfect tool for all problems. (Nothing fits that bill.) Nice set of links, btw, I only had two in my recent history :-) – r2evans Dec 08 '19 at 15:23
  • 2
    @AlexW You don't need `dat1 – markus Dec 08 '19 at 15:24
  • 2
    @r2evans Happy for us to take this chat elsewhere so that we don't distract from answering the OP's question. Yes, many people think the readability of `dplyr` is a strong value-add. Others disagree... On syntax, both `data.table` and `dplyr` have different syntax than base R. For instance, the use of piping... This discussion is getting far off topic. Let's either continue in chat or drop it. Cheers – Alex W Dec 08 '19 at 15:27
  • @AlexW can u check this please https://stackoverflow.com/questions/59233401/in-r-how-to-create-multilevel-radiogroupbuttons-as-each-level-depends-choicena?noredirect=1#comment104681894_59233401 – John Smith Dec 08 '19 at 18:40