Multiple look-behinds in Python? Regex separator error

Asked Nov 10 '20 at 03:04

Active Nov 10 '20 at 03:40

Viewed 30 times

I am trying to clean some data in a CSV that I have

I am trying to split addresses using the street suffix as a delimiter/separator:

I would like to run the following:

df1_splitz = pd.DataFrame(df1['owner_address'].str.split('(?<=DR|ST|PL|RD|LN|CT|CIR|AVE|HWY|WAY|BLVD|PKWY)\s',1).tolist(),columns=['street','city'])

Main problem: Avoid "look-behind requires a fixed-width pattern" error.

Is there a method I can use to get around this error? Any help would be greatly appreciated. Thanks in advance.

edited Nov 10 '20 at 03:40

asked Nov 10 '20 at 03:04

mexicanRmy

Nope. You can run regex vs re to get PCRE which allow variable width lookbehinds – dawg Nov 10 '20 at 03:48
Darn. Thank u @dawg – mexicanRmy Nov 10 '20 at 03:55
Of course it is possible to run your regex in `re`. Just re-format the single lookbehind into several ones separated with alternation operator inside a non-capturing group, see [this answer of mine](https://stackoverflow.com/a/40617321/3832970). – Wiktor Stribiżew Nov 10 '20 at 09:14
Why use at all the lookbehind? How about using `groups()`, something like `re.search(r'^(.*?(?:DR|ST|PL|RD|LN|CT|CIR|AVE|HWY|WAY|BLVD|PKWY)\b)\s+(.*)', str).groups()` – bobble bubble Nov 10 '20 at 10:13

0 Answers0