2

Don't want to write a big paragraph and for the beginning, what I would like to say is I started doing a machine vision project and I flattened my 256x256 photos into a vectors of 65,536 features.

First of all, I want to point out that I'm a total beginner when it comes to machine vision. I trained a few ML models, using these 65,536 features vectors (in total 1130 images/vectors) and the best accuracy was obtained by Naive Bayes which was 0.73. Is this a good accuracy?

Another question I have would be, is it smart to do a feature selection and potentially "cut/prune/remove" some of my features? If so, how could I do that? Until now, I have only worked with datasets with max of 12 features and I went feature-by-feature in order to determine which one to remove but in this case, if I go feature-by-feature for 65,536 features it might take a year to select the good features. What do you think?

anthino12
  • 123
  • 3
  • 5
    In computer vision usually this is done with neural networks which have a couple of convolutional layers which reduce the dimensionality at each step, which works well... if done right. – user2974951 Feb 22 '23 at 14:19
  • Hey, thanks for helping me. I'm aware about CNNs and I've tried them but I wanted to compare ML models vs DL models. I'm at the step where I train ML models and I wanted to improve my accuracy, if possible. – anthino12 Feb 22 '23 at 14:23
  • 3
    Before CNNs, researchers used specialized feature extraction for images, e.g. HOG, SIFT. Using pixel values directly is unlikely to generate particularly good results. Whether or not some value of accuracy is "good" depends on your criteria for quality. In this case, you'll want to research other computer vision models and determine how your level of accuracy compares to models that are similar. Compared to CNNs trained on standard benchmark datasets, 73% accuracy is terrible. https://stats.stackexchange.com/questions/363640/what-are-the-current-state-of-the-art-convolutional-neural-networks – Sycorax Feb 22 '23 at 14:43

1 Answers1

1

Each feature of yours is a pixel. Are there any pixels that you really feel could be removed without sacrificing information? It might be that you know the content of interest is always in the middle, so you might be willing to crop the images to include only the middle $128\times128$ pixels, which would be a $75\%$ reduction in the features. If, however, you could have important information anywhere in the image, then you probably want to keep it.

Thus, a reasonable stance on feature selection is not to do it. You risk sacrificing information that determines the outcome without an obvious upside. Sure, it is possible to overfit when there are many features, but it is possible to underfit when you leave out features. Especially in a situation where the interactions between features are likely to be the determinants of the image content, it sure seems like you would be sacrificing useful information by discarding pixels that might be relevant.

Many approaches are possible to extract features from images that might be in a lower dimension than the pixels themselves. The comments mention HOG (Histogram of Oriented Gradients) and SIFT (Scale-invariant Feature Transform) as possibilities. Fourier and wavelet transformations are other possibilities.

As far as whether or not your $73\%$ accuracy represents a good score, it is hard to say. If $80\%$ of the images belong to one category, then $73\%$ is quite poor. If the best accuracy anyone has achieved so far is $74\%$, then your performance seems pretty good.

Dave
  • 62,186