I am running a full text search in Postgres 10 over texts which include a variety of alphanumerical IDs (as well as 'real' text). My clients want to be able to find those IDs without the need to enter the leading zeroes. As the number of digits vary for different IDs I do not want to go down the path of adding extra zero-padded search queries.
I was wondering if there is a way to remove leading zeroes using a dictionary. But this would require some wildcards to catch an alphanumeric expression with leading zeroes. My understanding so far is that this is not supported by the ispell/hunspell dictionaries supported by Postgres 10.
I am currently thinking about using regexes like /\W[0]+(:alnum:)+/ and adding the matches to the text before feeding the text to the search index. This way I would catch both the zero-padded original and the variant without leading zeroes. But it does feel a little clumsy...
Does anybody have a better idea on how to do this using a full text search dictionary (or something similar)?
The texts I am search look like this:
'Max Mustermann 00023131021 Wunderstr. 17 Trallerhausen 002ed2323'
Example - they are typically much longer in reality and contain up to 50 different IDs.
Here, I would like to hit on '00023131021', '23131021', '2ed2323 & Wunder:*', etc.
0or000? – Erwin Brandstetter Apr 12 '18 at 13:51