Scoring validation database based on estimates from test database in R

Question

I divided my dataset into Test and Validation (50-50 split).

I ran glm function (link=binomial) on Test dataset and got the parameter estimates.

How do I score the Validation dataset based on these parameter estimates (beta) that I got from Test dataset. I know it has something to do with apply () but I am not sure. please advise.

score 1 · Answer 1 · answered Feb 16 '14 at 13:05

1

Split sample validation is highly unstable if $n < 20000$ and is not recommended. If you do have an enormous test sample you can use the R rms package val.prob to do the validation and its help file will give you some pointers. But better to look at the bootstrap and the rms validate and calibrate functions.

answered Feb 16 '14 at 13:05

Frank Harrell

91,879
6
178
397

Thanks Frank, can you give me reference to the article that states " validation is highly unstable if n<20000 "....Thanks Frank. – bison2178 Feb 17 '14 at 07:37
1

You can run a simple simulation or refer to my book Regression Modeling Strategies. – Frank Harrell Feb 17 '14 at 13:26

Scoring validation database based on estimates from test database in R

1 Answers1

Linked