Questions tagged [model-evaluation]

On evaluating models, either in-sample or out-of-sample.

In-sample model evaluation techniques can be based on measures of fit or loss-functions, but note that in-sample fit will typically increase spuriously as the model becomes more complex, which is called overfitting. For this reason, typically in-sample fit is penalized based on model complexity, like adjusted r-squared, aic or bic. AIC and BIC are also examples of information criteria, which can also be used in-sample.

Out-of-sample model evaluation usually relies on predictive accuracy and again on loss-functions. Distributional predictions can be evaluated using scoring-rules.

1116 questions

votes

1 answer

What is the opposite of precision called?

I know that $$precision = {\text { true positives } \over \text { predicted positives}}$$ but what about ${\text { true negative} \over \text { predicted negative}}$? what is it called? Thanks

model-evaluation

asked Jul 16 '22 at 18:04

AmirWG

votes

0 answers

Choose the model that minimizes the difference in performance on train and validation sets?

I understand the concept of training, validation, and testing datasets for model building. Typically when searching for the optimal hyperparameters for a given class of model, we choose the hyperparameter configuration that optimizes our chosen…

model-evaluation

asked Feb 17 '20 at 19:43

boot-scootin

votes

0 answers

Is there a specific measure for how many more classifications or signals an algorithm makes or picks up as opposed to another?

We all know precision and recall. What if two algorithms have the same precision and recall but one algorithm makes more predictions with the data available. Example: "I love samsung. Apple is terrible." Algorithm 1: "I love samsung. Apple is…

model-evaluation

asked Feb 21 '17 at 19:20

iuppiter

vote

1 answer

How to assess a model where you are interested in the probability output

I know that we assess performance of classifiers typically with metrics like accuracy, ROC, etc. typically because we want to know whether or not a classifier can accurately predict an outcome. But, what if we are more interested in the…

model-evaluation

asked May 29 '18 at 01:13

Peter

vote

0 answers

Why is extrinsic evaluation time-consuming?

In this slide it goes that extrinsic evaluation is time consuming, usually takes days or weeks. I have tried to understand that. Firstly, I learned from this slide that to evaluate an n-gram model the best way is extrinsic evaluation which implies…

model-evaluation

asked Sep 30 '16 at 12:50

Lerner Zhang

6,636
1
41
75

vote

0 answers

Evaluating Classification Models with Class Probabilities

I'm curious to see if there are any useful metrics to evaluate classification models using numeric probabilities. Traditionally, I would train a classification model, generate factor predictions on the test set, and use a confusion matrix or ROC…

model-evaluation

asked Oct 06 '15 at 03:27

Minh

vote

1 answer

Possible values for sensitivity, specificity, precision and accuracy

I have some results where the tester claims the following values: Sensitivity: 0.525, Specificity: 0.925, Precision: 0.516, Accuracy: 0.907 Where Sensitivity=TP/(TP+FN), Specificity=TN/(TN+FP), Precision=TP/(TP+FP), …

model-evaluation

asked Jul 28 '14 at 09:22

James Brown

votes

0 answers

Some curiosity about dice coefficient calculation

In semantic segmentation task evaluation with the following properties : batch size of the test set : 4 shape of a target mask/predicted mask : (1, 512, 512) number of batches in the test set : 30 used dice score calculation formula : Sum of [(2 x…

model-evaluation

asked Aug 30 '23 at 10:11

Cork

votes

1 answer

What performance indices are best to compare two time series with different data length . Can you suggest a method to do the comparison in R/Origin

I have two data sets (observed and simulated). Observed data set is the snow depth observed at a location. The simulated is the model simulated snow depth data. These data sets have different lengths. What are the performance indices that can be…

model-evaluation

asked Sep 01 '21 at 14:40

Sahila Beegum

votes

0 answers

Custom classification metric to optimize the precision of only top scores

If a machine learning classification model is used to predict the binary output of 1000 observations daily, and we only care about the precision of the top 100 predictions, how can we use a custom evaluation metric ? More details For the business…

model-evaluation

asked Mar 21 '21 at 12:48

John Smith

votes

2 answers

When do we need cross validation? It's a lack of training data or choose different models?

When do we need cross validation? It's a lack of training data or choose different models? What is the background of the cross validation? What is the target of the cross validation?

model-evaluation

asked Dec 21 '18 at 07:50

陈qingsong

votes

0 answers

Monthly statistics: How to prevent few high records from affecting many low records?

I am trying to make KPI's of a software support organisation performance. We have support requests which are software bugs requiring a new version to be released so the response time towards our customers is quite high (30-45 for most bugs) We have…

model-evaluation

asked Sep 21 '16 at 14:12

Alex