Having the following European driving license OCR text extracted using Tesseract.js, I would like to write multiple regular expressions that match different data fields on the driving license (ordering numbers below correspond to the digit preceding the field on any European driving license; the rules for labelling the data fields of these documents can be checked on wikipedia):
- surname (last name)
- other names ( first name(s) )
- date of birth
- b date of expiry
- ID drivingLicense
- address
HR 1. UZORAK
2. SPECIMEN
3. 01011977
1 42.01.07.2013 4. PUDUBROVACKO - NERETVANSKA
e 4b. 01.07.2023
5. 1234587 S i
, E %I\\'\f Dt — |
: = L 9.8 =
D112345671234567890121012017<2
My question is:
why is the regex /4(\.|,)*(b|!|8)\.?\s*[0-9\.]*/u matching /4b. 01.07.2023, but 4(\.|,)*(b|!|8)*\.?\s*[0-9\.]*/u (one extra asterisk after the second capture group, as compared to the former regex) is not ? (can be seen checked here: regex101)