0

R predict.lm function gives output of wrong size.

stocks = read.csv("some-file.csv", header = TRUE)

## 75% of the sample size
smp_size <- floor(0.75 * nrow(stocks))

## set the seed to make your partition reproductible
set.seed(123)
train_ind <- sample(seq_len(nrow(stocks)), size = smp_size)

train <- stocks[train_ind, ]
test <- stocks[-train_ind, ]

model = lm ( train$Open ~ train$Close, data=train)
model
predicted<-predict.lm(model, test$Open)
length(test$Open)
length(predicted)
length(test$Close)

> length(test$Open)
[1] 16994
> length(predicted)
[1] 50867
> length(test$Close)
[1] 16994

Why this is happening? output length of the predict functions should be equal to length of the test$Open , right?

Vishwajeet Vatharkar
  • 1,056
  • 4
  • 17
  • 41

2 Answers2

0

I can't say exactly how lm will interpret your train$Open and train$Close, but I can say your data=stocks is your problem. So, I can tell you where lm is getting your data from and why it isn't the length of your train set. You want model <- lm(Open ~ Close, data=train

doctorG
  • 1,521
  • 1
  • 10
  • 25
0

The problem lies in predicted<-predict.lm(model, test$Open) it should be

 predicted<-predict.lm(model, test)

the response is deleted in predict.lm anyhow in the

 line 15:       Terms <- delete.response(tt)

Actually it should have been test$Close for your model anyhow.

What you got was the result for the training set as effectivly you weren't providing any data at all (after the code delted the response. An example using iris

train_ind <- sample(seq_len(nrow(iris)),size=100)
train <- iris[train_ind,]
test <- iris[-train_ind,]
model=lm(Sepal.Length ~Sepal.Width,data=train)
model
predicted1 <-predict.lm(model,test)
length(predicted)
#fake response to keep dataframe structure
predicted2 <-predict.lm(model, predict.lm(model,data.frame(Sepal.Width=test$Sepal.Width))
length(predicted2)
predicted1-predicted2

the output of the last few lines

length(predicted)
[1] 50
> predicted2 <- predict.lm(model,data.frame(Sepal.Width=test$Sepal.Width)
> length(predicted2)
[1] 50
> predicted1-predicted2
  4   5   9  10  12  17  19  25  26  32  33  36  37  40  41  47  49  53  61  67  68  69  74  76  78  79  81  83  84  85  87 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
 92  94  98 105 110 112 113 114 122 125 127 128 132 133 137 140 141 142 145 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
CAFEBABE
  • 3,921
  • 1
  • 16
  • 35