1

I am looking for a way to break this kind of captcha:

captcha

I know that there are always 4 digits, in black and white. The image basically consists of an easily recognizable 4-digit code, mixed with a mask whose form is arbitrary.

I am pretty confident that I will manage to break the image into 4 parts containing the 4 digits. I also know that when we have a digit without mask, it is easily recognizable. But I don't see how to remove the mask...

Do you have any ideas please?

(I am doing this for a study project)

Arnaud
  • 113

1 Answers1

3

You're looking for a pipeline of machine learning methods.

As a first step you're looking forward to segment the image in 4 equal parts, each one isolating a letter: this is easily achievable trough the use of the sliding windows technique, with windows of arbitrarily large sizes if the sizes of the captchas vary.

As a second step you should train a neuronal network to learn to recognize single letters. Here the hard part is data collection: you might need a large sample to properly train your model.

A trick you can use to boost your training set value is to apply transformations to already collected samples, likely rotations and distortions for example. This process is widely used in text recognition.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Ramalho
  • 775
  • Thank you very much. However, I will not have time to annotate a large set of such images, because I assume I would need like 1 million images if I want the network to work, right? What I already have is a classical neural network that recognizes standard digits. So I thought that my goal was to transform the image before applying a neural net (other than just rotate/distort, but by changing the colors). Do you agree? – Arnaud Sep 29 '14 at 10:49
  • Not exactly a million, it depends on the way you model the problem. You should aim for the least number of features possible to properly represent a word/image. The data pre-processing phase is critical, you shouldn't include any colors other than black and white as the inclusion of colors will only require a more complex network (thus more examples will be needed). Re-scaling of the images to reduce the number of features is equally vital. Distorting your test images?! No!! But you should train your models and include distortions on the same words trained as additional train data. – Ramalho Sep 29 '14 at 13:59
  • Okay, thank you for these explanations, I have a better understanding now. Last question: what are the features of the network? Are they the values of the pixels (0 or 1 because it is black or white)? You say that I should reduce the number of features, but how to do this if we want to encode the values of the pixels ? Should I reduce the size of the image? And also, do you have an idea of how many training images I will need please? – Arnaud Sep 29 '14 at 17:17
  • Each feature in your dataset represents as you say a pixel. You need to find a way to rescale your images to a minimal size, without interfering with the quality of the observation. Try to scale them to 16x16 without losing much. Try different things.. I have no idea of how many training images will be needed, only testing can tell. Also, keep in mind that training image quality is as important as number. Don't focus in a concrete number. Use cross validation to assert the reliability of your network, and if needed collect further data. – Ramalho Sep 29 '14 at 17:34