0

I am a R beginner and currently facing a problem I can't conceptualize for now. I have looked several related posts but have not find a specific answer except there :
Aggregating rows with same Ids and retaining only unique entries in R

but my problem is a bit different.

Here's the structure of the initial df I wanna use :

sta_RHP_metho (3528,4) the variables are :
- "code.sandre" witch is the ID i'll use
- "CodeOpera" a unique id witch is related to "code.sandre"
- "Methode.de.peche" a character vector
- "year"

In that df there's as much rows as unique "CodeOpera" (3528). There are several "CodeOpera" by id/"code.sandre" and there are 180 code.sandre

What i want to get is a df with a unique row by "code.sandre" and the "Methode.de.peche" character value for each year.

I almost got that by processing the following code :

x2<-melt(sta_RHP_metho,c("code.sandre","CodeOpera","year"),"Methode.de.peche")
x3<-as.data.frame(dcast(x2,code.sandre + CodeOpera ~ year))

But I still have several as much rows as unique "CodeOpera" (3528) and as I said I don't know how to get a unique rox by ID.
A thing to notice is that it's possible to have several "Methode.de.peche" by year so i may need to concatenate returned values in some case.

Hope my explanations are clear.

Comments will be greatly appreciated ;)

Cheers.

Tristan


Thank you @ANG. Here's minimal reproducible example:

1/The dataframe I got after my melt/dcast operation :

code_sandre<-c("A","A","A","B","B","C","D")
year1<-c("a",NA,"a","b",NA,"c","b") 
year2<-c("a","b",NA,"b","b","c","b") 
year3<-c("a","b",NA,NA,NA,"c","b")
x<-data.frame(v1 =code.sandre,v2 =year1,v3 =year2, v4 =year3))

2/The dataframe I wanna get:

code_sandre<-c("A","B","C","D")
year1<-c("a","b",NA,"b")
year2<-c("a,b","b","c","b")
year3<-c("a,b",NA,"c","b")
result<-data.frame(code_sandre,year1,year2,year3)
ChrisF
  • 131,190
  • 30
  • 250
  • 321
  • 2
    Hello Tristan and welcome to StackOverflow (SO). Could you provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – nghauran Oct 21 '17 at 18:46

1 Answers1

0

I don't know if I got you right but it looks like you just want unique code.sandre no matter the value of CodeOpera. Do you get the expected result after trying this (check the result before using melt()):

library(data.table)
setDT(sta_RHP_metho)
# delete column "CodeOpera"
sta_RHP_metho <- sta_RHP_metho[, CodeOpera := NULL]
# take unique rows
library(dplyr)
sta_RHP_metho2 <- distinct(sta_RHP_metho)

OR

What I was able to achieve.

code_sandre<-c("A","A","A","B","B","C","D")
year1<-c("a",NA,"a","b",NA,"c","b") 
year2<-c("a","b",NA,"b","b","c","b") 
year3<-c("a","b",NA,NA,NA,"c","b")
x<-data.frame(code_sandre =code_sandre,
              year1 = year1,
              year2 = year2,
              year3 = year3)
library(dplyr)
x2 <- x %>%
        group_by(code_sandre) %>%
        summarise_at(.vars = vars(year1, year2, year3),
                     .funs = function(x) toString(unique(x[!is.na(x)])))
x2
x3 <- as.data.frame(x2)
x3[x3 == ""] <- NA
x3

I think it should be very close to your expected output.

nghauran
  • 6,402
  • 2
  • 18
  • 27