1

I'm trying to read this number using pytesseract: enter image description here and when I do it prints out IL:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
text = pytesseract.image_to_string(Image.open("Number.jpg"))
print(text)

I've also tried converting the image to black or white: enter image description here but this hasn't worked either. What am I doing wrong?

2 Answers2

2

pytesseract works best and gives accurate output with black text on white background. Preprocessing is the main part to get accurate results. But in your case a simple inverse binary thresholding is more than enough to get the correct output as your image does not contain any noise at all. Adaptive thresholding should be used only in case of uneven lighting.

>>> image = cv2.imread("14.jpg",0)
>>> thresh = cv2.threshold(image,0,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
>>> data = pytesseract.image_to_string(thresh,config= '--psm 6 digits')
>>> data
'14'

I think tesseract's version does not cause any problem.

Tesseract version tesseract v5.0.0-alpha.20200223 pytesseract version pytesseract Version: 0.3.4

Tarun Chakitha
  • 361
  • 2
  • 6
1

I think you've missed to set pytesseract' page-segmentation-mode (psm) configuration to 7 which is treating image as a single text line. (source)

I also applied thresholding, my result:

enter image description here

and when I set psm to 7

txt = pytesseract.image_to_string(thr, config="--psm 7 digits")
print(txt)

Result:

14

Code:


import cv2
import pytesseract

img = cv2.imread("d3njD.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 11, 4)
txt = pytesseract.image_to_string(thr, config="--psm 7 digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

Please note that, for other images, this solution may not work. You may need additional image processing methods, or you need to change the parameters.

  • pytesseract version: 4.1.1
Ahx
  • 6,551
  • 3
  • 18
  • 41