1

I am working on a project where I have to download more than 10 million records on a relatively small server. So instead of just downloading the entire dataset, I have to download it in smaller sections. I am trying to create a loop that will call batches of the data based on date. I'm used to coding in Stata where you can call a local by using `x' or some variant within a string. However, I can't find a way to do this in R. Below is a small piece of the code I'm using. Basically, whenever I try to run this 'val' and 'val2' aren't updating with the dates in the defined lists so the output literally just reads as if the server is trying to search between 'val' and 'val2' instead of between '20190101' and '20190301'. Any suggestions for how to fix this are greatly appreciated!

x<-c(20190101, 20190301)
y<-c(20190301, 20190501)
foreach (val=x, val2=y) %do% {
     data<-DBI::dbGetQuery(myconn, "SELECT * FROM .... WHERE (DATE BETWEEN 'val' AND 'val2')")
}
Ksadow
  • 13
  • 2
  • In order for us to help you, please edit your question to include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). For example, to produce a minimal data set, you can use `head()`, `subset()`, or the indices. Then use `dput()` to give us something that can be put in R immediately. Also, please make sure you know what to do [when someone answers your question](https://stackoverflow.com/help/someone-answers). More info can be found at Stack Overflow's [help center](https://stackoverflow.com/help). Thank you! – iamericfletcher Mar 13 '21 at 16:17

2 Answers2

3

With a basic loop

x<-c(20190101, 20190301)
y<-c(20190301, 20190501)

data_all = c()

for(i in 1:length(x)){
  
  query = paste0("SELECT * FROM .... WHERE (DATE BETWEEN '",
  x[i],  "' AND '",  y[i], "')")
  
  data <- DBI::dbGetQuery(myconn, query)
  data_all = rbind(data_all, data)

}
Arkadi w
  • 127
  • 17
1

With sprintf you can construct the query and use lapply + do.call to combine the results into one dataframe.

x<-c(20190101, 20190301)
y<-c(20190301, 20190501)

input <- sprintf("SELECT * FROM .... WHERE (DATE BETWEEN '%s' AND '%s')", x, y)
result <- do.call(rbind, lapply(input, function(x) DBI::dbGetQuery(myconn, x)))

Using purrr::map_df is a bit shorter.

result <- purrr::map_df(input, ~DBI::dbGetQuery(myconn, .x))
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178