Let's use the following example:
df <- as.data.frame(seq(as.POSIXct("2017-01-01 00:00:00"),as.POSIXct("2017-01-10 23:00:00"),by="hours"))
colnames(df) <- "Time"
My actual data frame has several variables, but it is not needed here for the problem. I need to select rows between two dates, let's say 2017-01-02 and 2017-01-03, and I face 2 issues:
- I have a one hour delay in my selection: it goes from 2017-01-01 23:00:00 to 2017-01-03 22:00:00 in my real dataset. I cannot reproduce the problem on my toy example using the exact same coding idea:
df2 <- df[c(which(df$Time=="2017-01-02 00:00:00"):which(df$Time=="2017-01-03 23:00:00")),]
Here it works fine, but not on my actual data. Which makes me wonder whether this can be a time zone issue? Or a difference between POSIXct and POSIXlt encoded dates? How can I solve it?
- Actually, I select rows in one dataframe based on dates in another dataframe. In one dataframe, timing is encoded simply as dates, in the other one as date + time. The current code drops the last 23 hours, which I want to keep:
df3 <- df[c(which(df$Time=="2017-01-02"):which(df$Time=="2017-01-03")),]
On my actual dataset, I have a warning: "numerical expression has 24 elements: only the first used". How can I use all the elements for which the date is 2017-01-03?
Edit
I found the answer to my second question: thanks to this post I understand that when I write 1:N, R returns the first element of N if N is a vector. While in my case, I want it to go until the last one that verifies df$Time=="2017-01-03". I could thus use any of @anonymous suggestions. But when I test the condition df$Time=="2017-01-03" it returns TRUE only for "2017-01-03 00:00:00" and not for "2017-01-03 01:00:00" until "2017-01-03 23:00:00". Actually, this post helped: I need to specify that I work on the Date part, otherwise a time stamp is added. So this works:
df3 <- df[c(which(df$Time=="2017-01-02"):tail(which(date(df$Time)=="2017-01-03"),n=1)),]
I am still looking for the answer for question 1 though.