0

I have a directory of directories:

models <- dir("shopperml_pr_points")
> models
 [1] "add_email_subscribers" "custom_domain"         "email_campaign"        "fb_connect"            "gmb"                  
 [6] "holdout"               "ola"                   "ols"                   "post_to_fb"            "sev" 

Within each directory there is another directory of files e.g.

> list.files(paste0("shopperml_pr_points", "/", models[1]))
[1] "add_email_subscribers_task_completed_pr_auc_1547157396.csv" "add_email_subscribers_task_completed_pr_auc_1547157473.csv"
[3] "add_email_subscribers_task_completed_pr_auc_1547157551.csv" "add_email_subscribers_task_completed_pr_auc_1547157631.csv"
[5] "add_email_subscribers_task_completed_pr_auc_1547157712.csv"

I would like to create a list of dataframes, one for each directory within models. So, the first df will be based on directory "add_email_subscribers" and will be the combination of the 5 csv files above.

I wanted to use do.call(rbind, read.table) per this post but since I'm not in the same directory as where the files are actually read from, I'm finding this challenging. I wnet down a path of pasting a long string for each individual csv file but I wondered if there's a more elegant r solution that can already detect the full path of a file such as those within list.files(paste0("shopperml_pr_points", "/", models[1])).

How can I create a list of 9 dataframes based on the directories within models where each directory contains ~5 csv files and those 5 csv files should be collapsed into one dataframe?

Doug Fir
  • 17,940
  • 43
  • 142
  • 263

2 Answers2

1

This should do it. First get the subdirectories subdirs, then for each subdir, read and bind together the files. Then you'll have a list of dfs.

parent_dir <- "shopperml_pr_points"

subdirs <- dir(parent_dir, full.names=TRUE)

df_list <- lapply(subdirs, function(path){
  files <- dir(path, full.names=TRUE, pattern="\\.csv$")
  return(do.call(rbind, lapply(files, read.csv)))
})

If you want to keep track of which rows are from which files within each df, you can add a from_file column to each df. For example like this:

df_list2 <- lapply(subdirs, function(path){
  files <- dir(path, full.names=TRUE, pattern="\\.csv$")
  inner_df_list <- lapply(files, function(fname){
    dat <- read.csv(fname)
    dat$from_file <- fname
    return(dat)
  })
  return(do.call(rbind, inner_df_list))
})
lefft
  • 1,947
  • 12
  • 20
  • Thanks for the answer here. I was trying to add an iterator to the data frames to know which number of file was read in. Tried rbindlist per this post: https://stackoverflow.com/questions/54155700/reading-multiple-csv-files-into-a-single-df-and-adding-a-number-iterator-as-a-co . So final block looks like ```models_dir – Doug Fir Jan 12 '19 at 01:40
  • If you know how to integrate rbind list do let me know? – Doug Fir Jan 12 '19 at 01:40
  • 1
    Updated answer should give what you need, as I understand. If you're talking about `data.table::rbindlist()`, not sure, I don't really work with `data.table`. – lefft Jan 12 '19 at 02:08
1
list.files(path = 'C:/Users/Documents/', all.files = T, full.names = TRUE)
aashish
  • 305
  • 1
  • 7