0

I do have a little twist in my head. I think this should be somewhat easy, but I just can't figure it out. I have this data:

                                     tipologia date_info    n
1  Aree soggette a crolli/ribaltamenti diffusi       day  113
2  Aree soggette a crolli/ribaltamenti diffusi     month   59
3  Aree soggette a crolli/ribaltamenti diffusi   no date  506
4  Aree soggette a crolli/ribaltamenti diffusi      year 1880
5   Aree soggette a frane superficiali diffuse       day   24
6   Aree soggette a frane superficiali diffuse     month    7
7   Aree soggette a frane superficiali diffuse   no date  148
8   Aree soggette a frane superficiali diffuse      year  142
9       Aree soggette a sprofondamenti diffusi       day    1
10      Aree soggette a sprofondamenti diffusi   no date    1
11      Aree soggette a sprofondamenti diffusi      year    2
12                             Colamento lento       day   25
13                             Colamento lento     month   12
14                             Colamento lento   no date   27
15                             Colamento lento      year  177
16                            Colamento rapido       day   64
17                            Colamento rapido     month    3
18                            Colamento rapido   no date   12
19                            Colamento rapido      year   92
20                                   Complesso       day  107
21                                   Complesso     month   23
22                                   Complesso   no date  150
23                                   Complesso      year  138

What I want to do now is to sum up all values in the column "n" for each group in tipologia. But I dont want to lose the information in "date_info". So I basically just want to append a column that for the first group "Aree soggette a crolli/ribaltamenti diffusi" would have the value (113+59+506+1880 =2556) in the first four rows.

So I tried something like

df  %>% count(tipologia, date_info) %>% 
  group_by(tipologia) %>% 
  summarise(total = sum(n)) 
   

but then I obviously "loose" my "date_info" column.

   tipologia                                   total
   <chr>                                       <int>
 1 Aree soggette a crolli/ribaltamenti diffusi  2558
 2 Aree soggette a frane superficiali diffuse    321
 3 Aree soggette a sprofondamenti diffusi          4
 4 Colamento lento                               241
 5 Colamento rapido                              171
 6 Complesso                                     418
 7 Crollo/Ribaltamento                          2932
 8 DGPV                                           50

When I group by tipologia and date_info and then sum up n, it does not build the sum for some reason

df %>% count(tipologia, date_info) %>% 
  group_by(tipologia, date_info) %>% 
  summarise(total = sum(n)) 

And the result looks like

   tipologia                                   date_info total
   <chr>                                       <chr>     <int>
 1 Aree soggette a crolli/ribaltamenti diffusi day         113
 2 Aree soggette a crolli/ribaltamenti diffusi month        59
 3 Aree soggette a crolli/ribaltamenti diffusi no date     506
 4 Aree soggette a crolli/ribaltamenti diffusi year       1880
 5 Aree soggette a frane superficiali diffuse  day          24

I think the answer might be somewhere in here too How to sum a variable by group, but I just can't figure it out...:/

Robin Kohrs
  • 575
  • 4
  • 13
  • 3
    Use `mutate` not `summarise` to add a column to the original data. (In your first attempt). – Gregor Thomas Dec 22 '20 at 17:56
  • I guess you can still use `summarise` with `date_info` as another column beacuse in the newer version, `summarise` is not limited to return only single row per grou – akrun Dec 22 '20 at 17:57
  • Your second does sum up, but it sums by your groups, which include the `date_info`. If you had multiple rows with the same tipologia and date_info, they would be added together and collapsed in your second attempt. – Gregor Thomas Dec 22 '20 at 17:57
  • Or another option would be to use `mutate` as GregorThomas mentioned and wrap with `distinct` at the end – akrun Dec 22 '20 at 17:59
  • Thank you very much!! That did the trick:) Really appreciate it – Robin Kohrs Dec 22 '20 at 21:06

0 Answers0