0

I have 3 csv files, I have three columns in all the three files( Maths, Physics and Chemistry) and marks of all the students. I created a loop to read all the files and saved in a dataframe as follows. In every file line numbers 1,2,4,5 need to be skipped.

files <- list.files(pattern = ".csv") 

for(i in 1:length(files)){
  data <- read.csv(files[i], header=F, skip=2) # by writing skip=2 I could only skip first two lines. 
  View(data)
  mathavg[i] <- sum(as.numeric(data$math), na.rm=T)/nrow(data)
}

result <- cbind(files,mathavg)
write.csv(result,"result_mathavg.csv")

I could not able to calculate the average of math column in all the three files.

Like this I need to calculate for all the three subjects across three files. any help????

2 Answers2

1

This should work,

files  <- c("testa.csv","testb.csv","testc.csv")
list_files  <- lapply(files,read.csv,header=F,stringsAsFactors=F)

list_files  <- lapply(list_files, function(x) x[-c(1,2,4,5),])

mathav  <- sapply(list_files,function(x) mean(as.numeric(x[,2]),na.rm=T))
result  <- cbind(files,mathav)
write.csv(result,"result_mathavg.csv",row.names=F)

I didn't have access to your files, so I made up three and called them 'files'. I used the lapply function to load the files, then to remove the lines that you didn't want. I got the average using the sapply function then I went back to your code to get result, etc.

DarrenRhodes
  • 1,411
  • 2
  • 14
  • 28
0

mathavg needs to be initialized before it can be operated on with []. To remove lines 4 and 5 you just need to perform a subsetting operation after reading the data. lines 4 and 5 become 2 and 3 if you skip the first 2 lines when reading the data.

files <- list.files(pattern = ".csv") 
mathavg<-''
for(i in 1:length(files)){
  data <- read.csv(files[i], header=F, skip=2, stringsAsFactors=F) # by writing skip=2 I could only skip first two lines. 
  data<-data[-c(2,3),] 
  mathavg[i] <- mean(as.numeric(data$math), rm.NA=T) ##best to use R's builtin function to calculate the mean
}

result <- cbind(files,mathavg)
write.csv(result,"result_mathavg.csv")
emilliman5
  • 5,538
  • 3
  • 24
  • 36
  • Bear in mind that R users encourage each other to use apply functions above for loops, http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – DarrenRhodes Jan 04 '16 at 15:48
  • @emilliman5: The above code after execution showing all the values as NAN – Kalyan Ramanuja Jan 04 '16 at 20:40
  • According the snippet of data you posted above `data$math` needs to be `data$Math`. But without seeing the actual data I cannot troubleshoot any further. – emilliman5 Jan 04 '16 at 20:51