0

I am trying to develop an OCR AR application. The user will detect text (that exists in signs) with a camera and some related info will show up on the screen using Augmented reality.

As for the OCR part of the application, I have developed it with tesseract. This part works perfectly fine as I have tested it out using already saved photos which have text on them.

Now I need to do use the OCR code in real-time so the user will be able to detect text while the camera is on. But I am having issues with the the output that I am getting from my OCR code, and i think that this is the resolution of the input image.

So now I will explain my process to the real-time part

I am using Arcore, because of the AR part that I will develop later.

I am using the method onTapPlane(...) so everything will happen when the user will tap on the screen. Inside this method I am calling arFragment.getArSceneView().getArFrame().acquireCameraImage(); so I will get the current frame in Image type. After that I convert the Image to Bitmap because I need the image in Bitmap type to pass it as an input to my OCR method.

Here is the code of the onTapPlane(...)

@Override
    public void onTapPlane(HitResult hitResult, Plane plane, MotionEvent motionEvent) {
        Image capturedImage;
        try {

            capturedImage = arFragment.getArSceneView().getArFrame().acquireCameraImage();

            Bitmap bitmap = imageToBitmap(capturedImage); // convert Image to Bitmap
            Bitmap bitmap1 = rotateImage(bitmap, 90); // the image was appearing as landscape this is why the rotate Image
            doOCR(bitmap1);

        } catch (NotYetAvailableException e) {
            e.printStackTrace();
        }

    }

As concern as the imageToBitMap(...) I found the code from this question

  private Bitmap imageToBitmap(Image capturedImage) {
        ByteBuffer cameraPlaneY = capturedImage.getPlanes()[0].getBuffer();
        ByteBuffer cameraPlaneU = capturedImage.getPlanes()[1].getBuffer();
        ByteBuffer cameraPlaneV = capturedImage.getPlanes()[2].getBuffer();

        byte[] compositeByteArray = new byte[cameraPlaneY.capacity() + cameraPlaneU.capacity() + cameraPlaneV.capacity()];
        cameraPlaneY.get(compositeByteArray, 0, cameraPlaneY.capacity());
        cameraPlaneU.get(compositeByteArray, cameraPlaneY.capacity(), cameraPlaneU.capacity());
        cameraPlaneV.get(compositeByteArray, cameraPlaneY.capacity() + cameraPlaneU.capacity(), cameraPlaneV.capacity());

        ByteArrayOutputStream baOutputStream = new ByteArrayOutputStream();
        YuvImage yuvImage = new YuvImage(compositeByteArray, ImageFormat.NV21, capturedImage.getWidth(), capturedImage.getHeight(), null);
        yuvImage.compressToJpeg( new Rect(0, 0, capturedImage.getWidth(),capturedImage.getHeight()), 75, baOutputStream);
        byte[] byteForBitmap  = baOutputStream.toByteArray();
        Bitmap bitmap = BitmapFactory.decodeByteArray(byteForBitmap, 0, byteForBitmap.length);

        return bitmap;
    }

Also I created the following method because the capturedImage was in landscape and I needed it vertically

 public static Bitmap rotateImage(Bitmap source, float angle) {
        Matrix matrix = new Matrix();
        matrix.postRotate(angle);
        return Bitmap.createBitmap(source, 0, 0, source.getWidth(), source.getHeight(),
                matrix, true);
    }

To check if eveything is working well i debugged my code to check the Bitmap result but I saw that the resolution is not good and for this reason the text is not been detected correctly. In the following screenshots you can see more details.

As you can see the letters are very small in the image. I tried to go closer to the photo but I think that the capturedImage is from the whole plane, this is why the image is so wide.

Bitmap enter image description here

zoom-in Bitmap, bad resolution enter image description here

wrong output from OCR method enter image description here

My Thoughts for fixing this

  • i thought that the best would be to zoom in before the user tap on the screen, but this is not available in Arcore as I have searched

  • Another Idea that I have is once the user will find the text that he wants to detect he will be able to select the area/surface that he wants to get as bitmap. In other words, the bitmap will contain only the letters and not many blank areas. Something like in the image bellow but I really can't think of how to code something similar to this

enter image description here

Does anyone has any ideas of how to fix this? I would really appreciate it. I am really new in Android Dev and i have spent days to fix this :/

nCoder
  • 43
  • 4
  • What is the resolution you are using for arcore? If you use a 1920 by 1080 resolution, the converted bitmap should will be of higher quality –  Oct 03 '21 at 19:10

0 Answers0