Usually when OCR an table of content the columns are separated by a large space, so the outputs are not properly order. For example, for an table like this:

The output would be:
The Rank Function
Permutations of Atoms
Pure Set Theory and Axiom System ZF
3.5
3.6
3.7
I'd like it to be:
3.5 The Rank Function\112
3.6 Permutations of Atoms\116
3.7 Pure Set Theory and Axiom System ZF\118
But different TOCs has different the output patterns, so there is no way to build a regex script to automatically fix every book. The best approach is to fix it at the first place. But how?

