0

I am trying to transform my original data into start-stop format for Cox regression. My original dataset is like this:

df = data.frame(initial = c(25, 25, 20, 21, 21, 17), 
                total = c(4.25, 28, 0.5, 38, 14, 43), 
                age = c(30, 53, 20, 59, 35, 60), 
                ethanol = c(0.04, 0.306, 0.201, 0.222, 0.047, 0.085), 
                status = c(0, 0, 0, 0, 0, 1))

For example, for the first observation, the original data format like this:

    initial  total  age  ethanol  status
 1  25       4.25   30    0.04    0

The expected data format is like:

 id  start  stop     ethanol  status
 1   0.00   25.00    0.00     0
 1   25.00  29.25    0.04     0
 1   29.25  30       0        0

So I write codes as below

edf = data.frame(id = integer(), 
                 start = numeric(), 
                 stop = numeric(), 
                 ethanol = numeric(),
                 status = integer())

j = 1

for( i in 1:4){

  if( (df[i, 1] + df[i,2]) >= df[i,3] ){
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0
    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = df[i,"status"]
  } else{
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]
  }
}

But the data frame that I got is (for example, the first observation):

 id     start    stop    ethanol  status
 1      0.00     25.00   0.00     0
 1      25.00    29.25   0.04     0

One row is missing:

id  start    stop    ethanol  status
1   29.25    30      0        0

It seems that the last part in the else-statement hasn't been executed:

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]

I don't know what's wrong, Any suggestion? I use R version 3.4.4 on MacOS (x86_64-apple-darwin15.6.0.) Thanks in advance!

  • 3
    Please don't convey data using images: I won't spend the time to transcribe from a frame to something I can use in my R session. Please use something like the output from `dput(head(x))` in your question (in a code block). Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Sep 23 '18 at 04:40
  • 1
    Thanks. I've edited the post, try to make it reproducible. – HokieCookie Sep 23 '18 at 17:03

1 Answers1

0

You are not incrementing j, the row number, before writing the first row in each iteration of the loop. Consequently, you are writing over the previous row each time. The following will work.

j = 0

for( i in 1:4){
  j = j + 1
  if( (df[i, 1] + df[i,2]) >= df[i,3] ){
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0
    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = df[i,"status"]
  } else{
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]
  }
}

Edit: there are better ways of doing this. There is probably some package that can reshape the data more easily. Or you could create the three start/stop steps as separate data frames and then merge them. Failing that, you can at least simplify like this:

df$end = df$initial + df$total
for (i in rownames(df)) {
    r = df[i,]
    edf[nrow(edf) + 1,] = list(i, 0, r$initial, 0, 0)
    if (r$end >= r$age){
      edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, r$status)
    }
    else {
      edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, 0)
      edf[nrow(edf) + 1,] = list(i, r$end, r$age, 0, r$status)
    }
}
Stuart
  • 8,399
  • 1
  • 17
  • 30