1

I am trying to clean some data in a CSV that I have

I am trying to split addresses using the street suffix as a delimiter/separator:

  1. 68 E 89TH ST LOS ANGELES -> [68 E 89TH ST, LOS ANGELES]
  2. 3661 SUNSWEPT PWKY STUDIO CITY -> [3661 SUNSWEPT PKWY , STUDIO CITY]

I would like to run the following:

df1_splitz = pd.DataFrame(df1['owner_address'].str.split('(?<=DR|ST|PL|RD|LN|CT|CIR|AVE|HWY|WAY|BLVD|PKWY)\s',1).tolist(),columns=['street','city'])

Main problem: Avoid "look-behind requires a fixed-width pattern" error.

Is there a method I can use to get around this error? Any help would be greatly appreciated. Thanks in advance.

mexicanRmy
  • 39
  • 6
  • Nope. You can run regex vs re to get PCRE which allow variable width lookbehinds – dawg Nov 10 '20 at 03:48
  • Darn. Thank u @dawg – mexicanRmy Nov 10 '20 at 03:55
  • Of course it is possible to run your regex in `re`. Just re-format the single lookbehind into several ones separated with alternation operator inside a non-capturing group, see [this answer of mine](https://stackoverflow.com/a/40617321/3832970). – Wiktor Stribiżew Nov 10 '20 at 09:14
  • Why use at all the lookbehind? How about using `groups()`, something like `re.search(r'^(.*?(?:DR|ST|PL|RD|LN|CT|CIR|AVE|HWY|WAY|BLVD|PKWY)\b)\s+(.*)', str).groups()` – bobble bubble Nov 10 '20 at 10:13

0 Answers0