1

I have data with Order Id, Start Date & End Date. I have to split both the Start and End dates into intervals of 30 days, and derive two new variables “split start date” and “split end date”. Example: The below example illustrates how split dates are created when the Start Date is “01/05/2017” and the End Date is “06/07/2017” Suppose, an order have start and end dates as below

see the image for example

What is the code for this problem in R ?

N8888
  • 660
  • 2
  • 14
  • 20
S.Sur
  • 11
  • 2
  • 2
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not particularly helpful. – MrFlick Jul 04 '18 at 17:16

1 Answers1

0

Here is a solution which should generalize to multiple order id's. I have created a sample data with two order id's. The basic idea is to calculate the number of intervals between start_date and end_date. Then we repeat the row for each order id by the number of intervals, and also create a sequence to determine which interval we are in. This is the purpose of creating functions f and g and the use of Map.

The remaining is just vector manipulations where we define split_start_date and split_end_date. The last statement is to ensure that split_end_date does not exceed end_date.

df <- data.frame(
  order_id = c(1, 2),
  start_date = c(as.Date("2017-05-01"), as.Date("2017-08-01")),
  end_date = c(as.Date("2017-07-06"), as.Date("2017-09-15"))
)
df$diff_days <- as.integer(df$end_date - df$start_date)
df$num_int <- ceiling(df$diff_days / 30)
f <- function(rowindex) {
  rep(rowindex, each = df[rowindex, "num_int"])
}
g <- function(rowindex) {
  1:df[rowindex, "num_int"]
}
rowindex_rep <- unlist(Map(f, 1:nrow(df)))
df2 <- df[rowindex_rep, ]
df2$seq <- unlist(Map(g, 1:nrow(df)))
df3 <- df2
df3$split_start_date <- df3$start_date + (df3$seq - 1) * 30
df3$split_end_date <- df3$split_start_date + 29
df3[which(df3$seq == df3$num_int), ]$split_end_date <-
  df3[which(df3$seq == df3$num_int), ]$end_date
radmuzom
  • 453
  • 2
  • 10
  • I have fixed a minor typo - (seq - 1) replaced by (df3$seq - 1) – radmuzom Jul 05 '18 at 06:37
  • when i am importing csv file & work your function on it, it is showing many errors. here is the csv file link - https://files.fm/u/rwhrg7mz – S.Sur Jul 05 '18 at 08:14
  • I have already provided the logic, which appears to work. For the remaining part, you need to learn, understand and solve it yourself. We are not here to solve your homework problems. – radmuzom Jul 05 '18 at 08:16
  • @S.Sur You were supposed to solve this question or take hints. Not copy!! – sunitprasad1 Oct 16 '18 at 10:21