0

I have a dataframe that looks like this

head(d)
                  SW_ID test_date Public_Positive Public_Total Private_Positive Private_Total Tested_Positive Tested_Total   casedate
7344 36067NY00270811700   8/31/20               1           65                0             0               1           65 2020-08-31
7345 36067NY00270811700    5/8/20               1           24                0             0               1           24 2020-05-08
7346 36067NY00270811700    7/5/20               1           11                0             0               1           11 2020-07-05
7347 36067NY00270811700   8/19/20               0          108                0             0               0          108 2020-08-19
7348 36067NY00270811700   4/11/20               0            4                0             0               0            4 2020-04-11
7349 36067NY00270811700   4/29/20               1           11                0             0               1           11 2020-04-29
       County POP2020
7344 Onondaga   16260
7345 Onondaga   16260
7346 Onondaga   16260
7347 Onondaga   16260
7348 Onondaga   16260
7349 Onondaga   16260

I want to count the total number of Tested_positive for each SW_ID and create a new variable called "total_positive". I then want to take that variable and divide it by POP2020 and multiply by 100,000 to get the incidence rate. I believe I can get the incidence rate by d$incidence <- d$total_positive/POP2020 * 100000 but I am unsure of how to actually sum all the dates as an aggregate. Please advise

  • It looks like you're pretty new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes sample data like the output from `dput(head(dataObject)))` and any libraries you've used. Check it out: [making R reproducible questions](https://stackoverflow.com/q/5963269). – Kat Feb 07 '22 at 15:04

0 Answers0