3

I work with python and images of watches (examples: watch_1, watch_2, watch_3). My aim is to take a photo of a random watch and then find the most similar watches to it in my database. Obviously, one main feature which distinguishes the watches are their shape (square, rectangular, round, oval) but there are also other ones.

For now, I am just running a PCA and a KNN on rgb images of watches to find the most similar ones among them. My source code is the following:

import cv2
import numpy as np
import os
from glob import glob
from sklearn.decomposition import PCA
from sklearn import neighbors
from sklearn import preprocessing


data = []

# Read images from file
for filename in glob('Watches/*.jpg'):

    img = cv2.imread(filename)
    height, width = img.shape[:2]
    img = np.array(img)

    # Check that all my images are of the same resolution
    if height == 529 and width == 940:

        # Reshape each image so that it is stored in one line
        img = np.concatenate(img, axis=0)
        img = np.concatenate(img, axis=0)
        data.append(img)

# Normalise data
data = np.array(data)
Norm = preprocessing.Normalizer()
Norm.fit(data)
data = Norm.transform(data)

# PCA model
pca = PCA(0.95)
pca.fit(data)
data = pca.transform(data)

# K-Nearest neighbours
knn = neighbors.NearestNeighbors(n_neighbors=4, algorithm='ball_tree', metric='minkowski').fit(data)
distances, indices = knn.kneighbors(data)

print(indices)

However when I try to run this script for more than 1500 rgb images then I get a MemoryError at the point where the data are processed by the PCA method.

Is this normal for a pc with 24GB RAM and 3.6GHz Intel Core CPU without any discrete GPU?

How can I overcome this?

Shall I use another method like Incremental PCA (or a deep learning algorithm) or simply shall I buy a discrete GPU?

Outcast
  • 1,057
  • 2
  • 12
  • 29
  • Where are you getting the memory error? Is it when you are storing the images into the list data? – kingledion Feb 20 '18 at 15:52
  • Thank you for your useful note. I edited my post to answer it. I get it at the point where the data are processed by the PCA. – Outcast Feb 20 '18 at 15:56

1 Answers1

3

KNN is instance based so it will store all training instances in memory. Since you are using images this will add up quickly. KNN on untransformed images might not perform that well anyway, you could look into filter banks to transform your images to a bag-of-word-representation (which is smaller and more invariant).

However if it is accuracy you are aiming for I would recommend skipping all that (it is very 2012 anyway) in favor of using deep learning, fi: construct an auto-encoder and determine similarity on the encoded representation of an image (which could in turn be done using knn btw).

S van Balen
  • 1,364
  • 1
  • 9
  • 28
  • Thanks your interesting response(upvote). I had the bag-of-word-representation in my mind more related to text and not to image classification but I may think about it. – Outcast Feb 20 '18 at 13:25
  • 1
    I am certainly down for deep learning and I have already started to code an autoencoder with Keras. However, can I get good results with my hardware specifics or will I have memory problems again? (Also what do you think about Siamese neural networks regarding my application?) – Outcast Feb 20 '18 at 13:27
  • 1
    Yes B.O.W. comes for N.L.P., hence the name. It is, however, a very common approach in C.V. too. Please note, however, that is also becoming rapidly obsolete in favor of Deep Learning – S van Balen Feb 20 '18 at 13:27
  • This definitely looks like a job for Siamese networks! – Imran Feb 20 '18 at 13:30
  • Define good :) You won't compete for the state of the art, but that might not be what you are aiming for. If you keep it relatively shallow (say 3 conv layers) it will probably execute fine (will take some time though), given that your images are quite similar that might still work reasonably well. – S van Balen Feb 20 '18 at 13:32
  • Yes, it is matter of definition obviously. By good I mean very close to what human eye can evaluate as similar watches (e.g. similar shape, thickness of indices, colour etc). If it is too far from it then this is an issue. (But in any case I will run it an see what happens). – Outcast Feb 20 '18 at 13:43
  • Concerning Siamese Neural Networks (SNNs), so do you suggest using two auto encoders as SNNs? Also shall I use pair of images or it is better to use triplets of images? (There is a post about this https://datascience.stackexchange.com/questions/27795/neural-networks-find-most-similar-images but I did not figure out yet why triplets are necessary for deep ranking as I have the impression that even pairs of images suffice for this task) – Outcast Feb 20 '18 at 13:46
  • SNNs are, frankly, a new concept to me. @Imran suggested it. I merely suggested an autoencoder on which you can in turn build a model that computes similarity, that could already be KNN or cosine similarity. The downside of that is that it might group features which you deem irrelevant, the upside is simplicity. – S van Balen Feb 20 '18 at 14:32
  • Ops, sorry I did not notice that @Imran suggested this. Ok cool, thank you for your help so far! – Outcast Feb 20 '18 at 15:05
  • So @Imran do you have any specific application or research paper regarding SNNs or do you suggest just the basic ones? – Outcast Feb 20 '18 at 15:08
  • I really like @SvanBalen's suggestion to start simple with autoencoders and cosine similarity. For Siamese networks check out https://arxiv.org/abs/1503.03832 – Imran Feb 20 '18 at 15:10
  • Ok, I will take both into account. I have two further questions: 1) Why using triplets of images is necessarily better than using pairs of images for similarity ranking? 2) Is there any widely-used version of unsupervised SNNs (e.g. with auto encoders so that I can avoid labelling)? (Perhaps you can write all these in a proper answer so that I can upvote it etc) – Outcast Feb 20 '18 at 15:30