Questions tagged [classification]

Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.

Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.

-- Wikipedia at https://en.wikipedia.org/wiki/Statistical_classification

6881 questions
28
votes
2 answers

How large a training set is needed?

Is there a common method used to determine how many training samples are required to train a classifier (an LDA in this case) to obtain a minimum threshold generalization accuracy? I am asking because I would like to minimize the calibration time…
Lunat1c
  • 477
26
votes
3 answers

What is the difference between SVM and LDA?

What is the difference between Support Vector Machines and Linear Discriminant Analysis?
15
votes
4 answers

Classification with tall fat data

I need to train a linear classifier on my laptop with hundreds of thousands of data points and about ten thousand features. What are my options? What is the state of the art for this type of problem? It seems like stochastic gradient descent is…
carlosdc
  • 3,235
9
votes
1 answer

Categorization/Segmentation techniques

First, let me say that I am a bit out of my depth here, so if this question needs to be re-phrased or closed as a duplicate, please let me know. It may simply be that I don't have the proper vocabulary to express my question. I am working on an…
Colin K
  • 487
6
votes
1 answer

Bayes Decision Boundary and classifier

Is it correct to say that the purpose of classifier (e.g. K-NN, Logistic Regression, LDA) is to approximate the Bayes Decision boundary?
hans-t
  • 569
  • 2
  • 10
  • 18
6
votes
2 answers

One class classifier vs binary classifier

When we have two classes A, B, can we use one classifier which will be trained on class A samples and tested on both class A, class B samples? When to use binary classifier and one-class classifier? Can I use one-class classifier for any type of…
6
votes
1 answer

Classifier success rate and confidence intervals

Suppose we measure the classifier error on a test set and obtain a certain success rate - say, 75%. Now, of course, this is only one measurement - how to calculate the "true" success rate? Sure it will be close to 75% but how close? I understand…
andreister
  • 3,357
5
votes
2 answers

Is classification only a machine learning problem?

Wikipedia says that: In machine learning, multiclass or multinomial classification is the problem of classifying instances into one of three or more classes. It looks like the problem is strictly related to machine learning. But to me, this is…
dzieciou
  • 173
5
votes
2 answers

How can I discern whether a classifier's outcome is significantly different?

This is a question I have posted here some while ago. and I would like to know if you think of more solutions for it from the ML perspective? Unfortunately, I can't use McNemar's test due to the small sample size (hence the values are not normally…
Dov
  • 1,810
4
votes
0 answers

Hierarchical classification

I'm currently working on the classification with massive amount of data. Similar to the kaggle one. Data input consist of features and multiple labels that can be hierarchically aligned. At first I flattened the data and tried to learn multi-label…
Student
  • 41
4
votes
2 answers

Cluster data into categories; train one classifier per category

Let me first give you a hypothetical example to make things more clear: Let's say your task is to classify art as either professional or amateur work, based on image data. You extracted 100 features from each artwork using image processing. The…
Maarten
  • 253
4
votes
1 answer

Hide implicit information in input data from classifier

I have a classification task where the input data (text) contains information which I don't want to use for classification (implicit in the texts) and therefore have to "hide" from the classifier. I have something like a label for the implicit…
Baschdl
  • 143
4
votes
2 answers

LDA cutoff (decision boundary) value

I have run a linear discriminant analysis for the simple 2 categorical group case using the MASS package lda() function in R. With priors fixed at 0.5 and unequal n for the response variable of each group, the output basically provides the group…
BJessop
  • 41
4
votes
1 answer

How to assess if more data would improve classification

I am trying to make an argument that if my field collected larger samples, we would be able to make better models with higher predictive accuracy. However, there's also the possibility that we are reaching an asymptote because the quality of the…
aleph4
  • 141
4
votes
1 answer

Random Forest Usage

Random Forest Usage: I have run random forest in R. It gave me confusion matrix and variable importance. Variable importance can be used to rank importance of variables in the model. My question is how i can use random forest for classification,…
Riya
  • 609
1
2 3
9 10