0

I have detection/non-detection data in the following format:

> yr2007<-Visit.data_allyears[[1]];head(yr2007)
  SiteName year PAdata Longitude Latitude   totalspp  totalhours     lhours temperature   rainfall       NDVI   SA_elevati      NDVI2
1   2229AB 2007      0    29.375  -22.125  0.2738900  0.04145321  0.3590574   0.7571729  0.3486277  0.2562413 -0.401474904  0.2562406
2   2230CA 2007      0    30.125  -22.625 -0.4672811 -0.43741429 -0.4601641   0.8803066 -0.7668375 -0.1580487 -0.243873619 -0.1580504
3   2230DA 2007      0    30.625  -22.625 -0.7966905  0.28088696  0.6705110   1.0815264 -0.8644850 -0.6821884 -0.009113371 -0.6821913
4   2230DB 2007      0    30.875  -22.625 -1.9907996 -0.43741429 -0.4601641   1.3638363 -0.9247028 -0.8610864 -0.475350506 -0.8610896
5   2231AC 2007      0    31.125  -22.375  2.8268128 -0.43741429 -0.4601641   0.8652892  1.3981484  1.6497624 -0.867712039  1.6497649
6   2231AC 2007      0    31.125  -22.375  1.3032943  0.04145321  0.3590574   0.8652892  1.3981484  1.6497624 -0.867712039  1.6497649

So the sites (in SiteName) can have multiple entries, each entry corresponding to a survey. I would like to convert to wide format, such that each site has only one row and then the PAdata (0/1) is spread out into columns. The problem is that the SiteName is not a factor variable where the categories are repeated, each site has is uniquely repeated. The maximum entries of one site is 27, so the end result should have 27 columns corresponding to 27 surveys and those sites that don't have PAdata for all the columns, should have NA values where applicable. The remaining columns should just collapse accordingly.

Here is a simulated small dataset:

SiteName<-c("site1","site3","site4")
SiteName<-rep(SiteName,times=c(3,6,10) );length(SiteName)
PAdata<-c(0,0,1,0,0,0,1,0,1,1,1,1,0,1,0,0,1,1,0);length(PAdata)
temp<-c(2,2,2,5,5,5,5,5,5,7,7,7,7,7,7,7,7,7,7);length(temp)

data<-data.frame(SiteName,PAdata,temp)

The end result should look like:

SiteName Surv1 Surv2 Surv3 Surv4 Surv5 Surv6 Surv7 Surv8 Surv9 Surv10 Temp
Site 1 0 0 1 NA NA NA NA NA NA NA 2
Site 2 0 0 0 1 0 1 NA NA NA NA 5
Site 3 1 1 1 0 1 0 0 1 1 0 7

Any help please!

username97
  • 103
  • 6
  • `data %>% group_by(SiteName) %>% mutate(row = row_number()) %>% pivot_wider(names_from = row, values_from = PAdata, names_prefix = 'survey')` using `tidyverse`. – Ronak Shah Oct 02 '21 at 08:17

0 Answers0