I have some doubts with the construction of the datatset to be able to train my own OCR Attention model. At the moment I have the images with their respective transcription in a txt. For example I have the following:
img01.png img01.txt
The image contains the sign with the text and the text file with the transcription of the text. I have tried to follow the next post: How to create dataset in the same format as the FSNS dataset? but I don't understand very well what I must have to be able to generate the dataset in FSNS format. I hope someone can explain me further.
Thank you very much!