-1

I am trying to replace null values based on two columns. Basically, I have company codes in one column and its respective values in the second. I need to replace mean of the values for each of the company code rather than mean of the complete column. How do I do it in R? (Look at the image below)

enter image description here

joran
  • 163,977
  • 32
  • 423
  • 453
Varsha
  • 59
  • 4
  • 2
    What did you try? This is not the way a question is communicated. Check on how to ask a question example. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – user5249203 Jan 29 '18 at 21:01

3 Answers3

0

Do you need something like this:

df <- data.frame(Symbol = c("NXCDX", "ALX", "ALX", "BESOQ", "BESOQ", "BESOQ"), 
                Values = c(2345, 8654, NA, 6394, 8549, NA))

df %>% dplyr::group_by(Symbol) %>% dplyr::summarise(mean_values = mean(Values, na.rm = TRUE))
Mislav
  • 1,443
  • 13
  • 31
0

using data.table

library(data.table)
setDT(df)[,replace(Values,is.na(Values),mean(Values,na.rm = T)),by=Symbol]
onyambu
  • 49,350
  • 3
  • 19
  • 45
0

Assuming your data is in a data frame called 'myData' you can go ahead and use the ddply function from the plyr package to generate the mean per company code. The ddply function applies a function to a column(s) grouped by another column(s).

library(plyr)

#Find the entries where the values are NULL, using "" (empty string) as NULL
#Can replace "" with whatever NULL is for you
nullMatches <- myData$Values == ""

#Generate the mean for each company
#This will return a 2 column data frame, first column will be "Symbol".
#Second column will the value of means for each 'Symbol'.
meanPerCompany <- ddply(myData[!nullMatches,], "Symbol", numcolwise(mean))

#Match the company symbol and store the mean
myData$Values[nullMatches] <- meanPerCompany[match(myData$Symbol[nullMatches], meanPerCompany[,1]),2]
LachlanO
  • 1,124
  • 6
  • 14