I am writing to you because I have been for several hours now stuck on a code in R. Initially, I thought it would be something very simple, but nothing I have tried has worked. I am building a code that imports a number of databases and for each of these databases calculates the average ratio of NA, zero values and empty values. The code is built so that it creates an auxiliar database with the variable names from every database and stores the ratio of missing values for each variable. However, the problem is in trying to store that auxiliar database. The idea is that the auxiliar database is stored with the name of the original database, that is to say that it depends on the factor k that iterates according to all the databases. The problem is that I have not been able to do this, all the alternatives to make it something like: base_[k] where k varies according to the name of the database fail.
Have any of you experienced something like this, I don't know what to do anymore. Thanks a lot. I leave the code so you can understand it a little better.
rm(list = ls())
setwd("C:/Users/Kevin/Escritorio/UK 2022.05.24")
listcsv <- dir(pattern = "*.csv") # creates the list of all the csv files in the directory
results <- as.data.frame(listcsv)
results$mean_na_ratio <- -777
results$mean_zero_ratio <- -777
results$mean_no_value_ratio <- -777
for (k in 1:length(listcsv)){
df <- read.csv(listcsv[k],stringsAsFactors=FALSE)
c1 <- colMeans(is.na(df))
results[k, "mean_na_ratio"] <- mean(c1)
vars_vector <- colnames(df)
vars_dataframe <- as.data.frame(vars_vector)
rownames(vars_dataframe) <- vars_dataframe$vars_vector
for (i in vars_vector){
df[,i] <- as.character(df[,i])
df$temp <- df[,i]
vars_dataframe[i, "mean_zero_ratio"] <- nrow(subset(df, temp=="0"))/nrow(df)
vars_dataframe[i, "mean_no_value_ratio"] <- nrow(subset(df, temp==""))/nrow(df)
}
vars_dataframe[is.na(vars_dataframe)] <- 0
results[k, "mean_zero_ratio"] <- mean(vars_dataframe$mean_zero_ratio)
results[k, "mean_no_value_ratio"] <- mean(vars_dataframe$mean_no_value_ratio)
**data_k <- vars_dataframe**
}
The problem is marked in bold
Thank you so much.