Extract text from light text on withe background image

Question

I have an image like the following:

and I would want to extract the text from it, that should be ws35, I've tried with pytesseract library using the method :

pytesseract.image_to_string(Image.open(path))

but it returns nothing... Am I doing something wrong? How can I get back the text using the OCR ? Do I need to apply some filter on it ?

SilverMonkey · Answer 1 · 2018-08-25T12:23:22.870

You can try the following approach:

Binarize the image with a method of your choice (Thresholding with 127 seems to be sufficient in this case)
Use a minimum filter to connect the lose dots to form characters. Thereby, a filter with r=4 seems to work quite good:
If necessary the result can be further improved via application of a median blur (r=4):

Because i personally do not use tesseract i am not able to try this picture, but online ocr tools seem to be able to identify the sequence correctly (especially if you use the blurred version).

score 1 · Accepted Answer · answered Aug 25 '18 at 14:29

1

Similar to @SilverMonkey's suggestion: Gaussian blur followed by Otsu thresholding.

answered Aug 25 '18 at 14:29

Yves Daoust

53,540
8
41
94

score 0 · Answer 3 · answered Aug 25 '18 at 11:16

The problem is that this picture is low quality and very noisy! even proffesional and enterprisal programs are struggling with this

you have most likely seen a capatcha before and the reason for those is because its sent back to a database with your answer and the image and then used to train computers to read images like these.

short answer is: pytesseract cant read the text inside this image and most likely no module or proffesional programs can read it either.

score 0 · Answer 4 · answered Aug 25 '18 at 13:05

0

You may need apply some image processing/enhancement on it. Look at this post read suggestions and try to apply.

answered Aug 25 '18 at 13:05

user70

539
2
7
21

Extract text from light text on withe background image

4 Answers4