Questions tagged [model-evaluation]

On evaluating models, either in-sample or out-of-sample.

In-sample model evaluation techniques can be based on measures of fit or , but note that in-sample fit will typically increase spuriously as the model becomes more complex, which is called . For this reason, typically in-sample fit is penalized based on model complexity, like adjusted , or . AIC and BIC are also examples of information criteria, which can also be used in-sample.

Out-of-sample model evaluation usually relies on predictive accuracy and again on . Distributional predictions can be evaluated using .

1116 questions
2
votes
1 answer

What is the opposite of precision called?

I know that $$precision = {\text { true positives } \over \text { predicted positives}}$$ but what about ${\text { true negative} \over \text { predicted negative}}$? what is it called? Thanks
AmirWG
  • 134
2
votes
0 answers

Choose the model that minimizes the difference in performance on train and validation sets?

I understand the concept of training, validation, and testing datasets for model building. Typically when searching for the optimal hyperparameters for a given class of model, we choose the hyperparameter configuration that optimizes our chosen…
2
votes
0 answers

Is there a specific measure for how many more classifications or signals an algorithm makes or picks up as opposed to another?

We all know precision and recall. What if two algorithms have the same precision and recall but one algorithm makes more predictions with the data available. Example: "I love samsung. Apple is terrible." Algorithm 1: "I love samsung. Apple is…
iuppiter
  • 408
  • 3
  • 13
1
vote
1 answer

How to assess a model where you are interested in the probability output

I know that we assess performance of classifiers typically with metrics like accuracy, ROC, etc. typically because we want to know whether or not a classifier can accurately predict an outcome. But, what if we are more interested in the…
Peter
  • 11
1
vote
0 answers

Why is extrinsic evaluation time-consuming?

In this slide it goes that extrinsic evaluation is time consuming, usually takes days or weeks. I have tried to understand that. Firstly, I learned from this slide that to evaluate an n-gram model the best way is extrinsic evaluation which implies…
Lerner Zhang
  • 6,636
  • 1
  • 41
  • 75
1
vote
0 answers

Evaluating Classification Models with Class Probabilities

I'm curious to see if there are any useful metrics to evaluate classification models using numeric probabilities. Traditionally, I would train a classification model, generate factor predictions on the test set, and use a confusion matrix or ROC…
Minh
  • 111
1
vote
1 answer

Possible values for sensitivity, specificity, precision and accuracy

I have some results where the tester claims the following values: Sensitivity: 0.525, Specificity: 0.925, Precision: 0.516, Accuracy: 0.907 Where Sensitivity=TP/(TP+FN), Specificity=TN/(TN+FP), Precision=TP/(TP+FP), …
0
votes
0 answers

Some curiosity about dice coefficient calculation

In semantic segmentation task evaluation with the following properties : batch size of the test set : 4 shape of a target mask/predicted mask : (1, 512, 512) number of batches in the test set : 30 used dice score calculation formula : Sum of [(2 x…
Cork
  • 3
  • 2
0
votes
1 answer

What performance indices are best to compare two time series with different data length . Can you suggest a method to do the comparison in R/Origin

I have two data sets (observed and simulated). Observed data set is the snow depth observed at a location. The simulated is the model simulated snow depth data. These data sets have different lengths. What are the performance indices that can be…
0
votes
0 answers

Custom classification metric to optimize the precision of only top scores

If a machine learning classification model is used to predict the binary output of 1000 observations daily, and we only care about the precision of the top 100 predictions, how can we use a custom evaluation metric ? More details For the business…
John Smith
  • 250
  • 1
  • 4
  • 17
0
votes
2 answers

When do we need cross validation? It's a lack of training data or choose different models?

When do we need cross validation? It's a lack of training data or choose different models? What is the background of the cross validation? What is the target of the cross validation?
0
votes
0 answers

Monthly statistics: How to prevent few high records from affecting many low records?

I am trying to make KPI's of a software support organisation performance. We have support requests which are software bugs requiring a new version to be released so the response time towards our customers is quite high (30-45 for most bugs) We have…
Alex
  • 131
  • 1