I am trying to transform my original data into start-stop format for Cox regression. My original dataset is like this:
df = data.frame(initial = c(25, 25, 20, 21, 21, 17),
total = c(4.25, 28, 0.5, 38, 14, 43),
age = c(30, 53, 20, 59, 35, 60),
ethanol = c(0.04, 0.306, 0.201, 0.222, 0.047, 0.085),
status = c(0, 0, 0, 0, 0, 1))
For example, for the first observation, the original data format like this:
initial total age ethanol status
1 25 4.25 30 0.04 0
The expected data format is like:
id start stop ethanol status
1 0.00 25.00 0.00 0
1 25.00 29.25 0.04 0
1 29.25 30 0 0
So I write codes as below
edf = data.frame(id = integer(),
start = numeric(),
stop = numeric(),
ethanol = numeric(),
status = integer())
j = 1
for( i in 1:4){
if( (df[i, 1] + df[i,2]) >= df[i,3] ){
edf[j,1] = i
edf[j,2] = 0
edf[j,3] = df[i,"initial"]
edf[j,4] = 0
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"]
edf[j,3] = df[i,"initial"] + df[i,"total"]
edf[j,4] = df[i,"ethanol"]
edf[j,5] = df[i,"status"]
} else{
edf[j,1] = i
edf[j,2] = 0
edf[j,3] = df[i,"initial"]
edf[j,4] = 0
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"]
edf[j,3] = df[i,"initial"] + df[i,"total"]
edf[j,4] = df[i,"ethanol"]
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"] + df[i,"total"]
edf[j,3] = df[i,"age"]
edf[j,4] = 0
edf[j,5] = df[i,"status"]
}
}
But the data frame that I got is (for example, the first observation):
id start stop ethanol status
1 0.00 25.00 0.00 0
1 25.00 29.25 0.04 0
One row is missing:
id start stop ethanol status
1 29.25 30 0 0
It seems that the last part in the else-statement hasn't been executed:
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"] + df[i,"total"]
edf[j,3] = df[i,"age"]
edf[j,4] = 0
edf[j,5] = df[i,"status"]
I don't know what's wrong, Any suggestion? I use R version 3.4.4 on MacOS (x86_64-apple-darwin15.6.0.) Thanks in advance!