1

Background I use R for statistical computations. I am working on data obtained from Lathe CNC Machining Systems there is a tool and there is a workpiece that needs to be operated on. Lathe Machining is a typical example of Subtractive Manufacturing.

Problem According to the literature, there is an absence of empirical models, and therefore the nature of the relationship between the explanatory variables and the predictor variables is not known, especially with respect to the degree of the polynomial. Therefore, according to the literature, researchers have attempted to fit either linear polynomial regression models, models with interaction, or ANNs.

The Question According to the books on R that I have read, people have compared the performance of models in multiple regression by using ANOVA. But suppose I wish to compare the performance of a regression model and ANN, how do I do that? Especially saying, what are the measures that I can use for such a purpose?

Dave
  • 62,186

1 Answers1

1

Out-of-sample testing is the standard way to do this. Train your model on most but not all of your data

Even better might be to have multiple out-of-sample groups (something like cross validation). Benavoli et al (2017) discuss a number of ways to do statistical inference based on model performance in such groups. While the Benavoli paper argues in favor of Bayesian methods, the paper also discusses competing frequentist methods.

There are, however, a few issues with out-of-sample testing.

  1. You withhold precious training data.

  2. There can be instability depending on how you split the data into training and holdout sets. This problem is worse the smaller the sample size. (Harrell (2015), for instance, recommends not to use holdout sets unless there are at least 20,000 observations.)

  3. If you do this, like your out-of-sample performance, and then train your model on all of the data combined, you lack holdout data to validate the final model that is trained on all data. Harrell (2015) advocates for bootstrapping to address this.

REFERENCES

Benavoli, Alessio, et al. "Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis." The Journal of Machine Learning Research 18.1 (2017): 2653-2688.

Harrell, Frank E. "Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis." (2015).

Dave
  • 62,186