1

I used Tesseract library (3.2.0-alpha2) from nuget. I playing also with older versions and with tessnet2 library and didn`t get any positive results for me. For sample I have 2 images: multiple numbers single number

When I tried recognize multiple numbers I only received number '541' and did not see numbers with single char '0'. When I tried to recognize single number I also did not have any result.

My code sample below:

        using (var engine = new TesseractEngine(@"tessdata/", "eng"))
        {
            engine.SetVariable("tessedit_char_whitelist", "0123456789");

            using (var img = Pix.LoadFromFile(@"multiple_numbers.bmp"))
            using (var page = engine.Process(img))
            using (var iterator = page.GetIterator())
            {
                Console.WriteLine(page.GetText()); 
                iterator.Begin();

                do
                {
                    var text = iterator.GetText(PageIteratorLevel.Word);
                    Console.WriteLine(int.Parse(text));
                }
                while (iterator.Next(PageIteratorLevel.Word));
            }
        }

I played with PageIteratorLevel for iterator, EngineMode for engine and PageSegMode for processing - without any success. Please help me to fix my problem. Main goal to receive all numbers from image. I can change recognition library if I will find simplest way.

Smoke
  • 11
  • 1
  • 2
  • Did you try white listing capital O? – RamblinRose Dec 14 '16 at 23:19
  • Yes, In any cases of single numbers it does not see it. I received only numbers which contains more than 2 chars. – Smoke Dec 14 '16 at 23:30
  • Possible duplicate of [Tesseract does not recognize single characters](http://stackoverflow.com/questions/9632044/tesseract-does-not-recognize-single-characters) – RamblinRose Dec 14 '16 at 23:32
  • When PageSegMode.SingleChar in processing - it can read "single number" image. But in this mode I cant read "multiple_numbers" image - it give output "2". :( – Smoke Dec 14 '16 at 23:39
  • psm 6 give the correct result with the command line version for the multiple numbers image, but I can't explain why: `tesseract.exe mgckH.png -psm 6 -c tessedit_char_whitelist=0123456789 -` – Stef Dec 29 '16 at 15:38
  • I know this is quite old question, but I had this very problem, and I see you also used black background and white numbers. Now I solved my problem by making the background white, and text black - inverted, and suddenly tesseract can translate anything. – WoodyDRN May 04 '22 at 10:29

0 Answers0