how to find match first word in string with regex in python

Question

I want to match the word 'St' or 'St.' or 'st' or 'st.' BUT only in first word of a string. For example ' St. Mary Church Church St.' - should find ONLY first St.

'st. Mary Church Church St.' - should find ONLY 'st.'
'st Mary Church Church St.' - should find ONLY 'st'

I want to eventually replace the first occurence with 'Saint'.

I have literally spent hours trying to find a regex that will match this problem so i have tried myself first and i now for some of you it will be easy!

Why do you need a regex? Just split the string up into words by whitespace and get the first one. — Blender, Aug 28 '16 at 16:01
Does the code only have to handle strings that start with a variation of "St."? Or are there other strings that start with something else? — Dartmouth, Aug 28 '16 at 16:19

JazZ · Answer 1 · 2016-08-28T18:44:42.070

3

Regex sub allows you to define the number of occurence to replace in a string.

i.e. :

>>> import re
>>> s = "St. Mary Church Church St."
>>> new_s = re.sub(r'^(St.|st.|St|st)\s', r'Saint ', s, 1) # the last argument defines the number of occurrences to be replaced. In this case, it will replace the first occurrence only.
>>> new_s
'Saint Mary Church Church St.'
>>>

Hope it hepls.

edited Aug 28 '16 at 18:44

answered Aug 28 '16 at 16:21

JazZ

4,234
2
18
39

This nearly works, but it needs one small fix; if you substitute strings starting with "St" or "st", there is no space after "Saint", so `re.sub(…)`ing `s = "St Mary Church Church St."` gives 'SaintMary Church Church St.' – Dartmouth Aug 28 '16 at 18:02
Thanks pointed it out. But like you've seen in the example, the output was good. Anyway, I edited my answer to take care the "st" expression followed by a space and added a space after "Saint". Thank you. ; ) – JazZ Aug 28 '16 at 18:46

Dartmouth · Answer 2 · 2016-08-28T16:18:52.517

2

You don't need to use a regex for this, just use the split() method on your string to split it by whitespace. This will return a list of every word in your string:

matches = ["St", "St.", "st", "st."]
name = "St. Mary Church Church St."
words = name.split()   #split the string into words into a list
if words [0] in matches:
    words[0] = "Saint"   #replace the first word in the list (St.) with Saint
new_name = "".join([word + " " for word in words]).strip()   #create the new name from the words, separated by spaces and remove the last whitespace
print(new_name)   #Output: "Saint Mary Church Church St."

edited Aug 28 '16 at 16:18

answered Aug 28 '16 at 16:11

Dartmouth

1,060
2
16
22

That's nice, but it doesn't check if the first word is St, or .. etc. – joel goldstick Aug 28 '16 at 16:15
OP isn't supplying enough information whether there are strings that don't start with a variation of "St."... I'll update my answer though. – Dartmouth Aug 28 '16 at 16:17
the [split](https://docs.python.org/2/library/stdtypes.html?highlight=str.split#str.split) method accepts a maxsplit argument. it could be nice to provide it to avoid processing all the string after it found the first split. – Tryph Aug 29 '16 at 13:14
@Tryph, that's quite useful, although it doesn't really simplify the current code. Besides, there is basically no processing of the rest of the string, only when `join`ing the words again. – Dartmouth Aug 29 '16 at 14:01

score 1 · Answer 3 · answered Jun 07 '19 at 20:19

Thanks for the question! This is exactly what I was looking for to solve my issue. I wanted to share another regex trick I found while hunting around for this answer. You can simply pass the flag paramater into the sub function. This will allow you to reduce the amount of information you need to pass to the pattern paramater in the tool. This makes the code a little cleaner and reduces the chances of you missing a pattern. Cheers!

import re
s = "St. Mary Church Church St."
new_s = re.sub(r'^(st.|st)\s', r'Saint ', s, 1, flags=re.IGNORECASE) # You can shorten the code from above slightly by ignoring the case
new_s
'Saint Mary Church Church St.'

score 0 · Answer 4 · edited Aug 28 '16 at 20:45

0

Try using the regex '^\S+' to match the first non-space character in your string.

import re 

s = 'st Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st'

s = 'st. Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st.'

edited Aug 28 '16 at 20:45

pylang

34,585
11
114
108

answered Aug 28 '16 at 16:31

orz

11
1

@orz, it may be your first time, so your answer has been edited to show what might be expected next time, format-wise. Remember to format code with code blocks, use reproducible examples that run in console and briefly explain what is happening. And welcome to SO. – pylang Aug 28 '16 at 20:49

score -1 · Answer 5 · edited May 23 '17 at 12:24

-1

import re

string = "Some text"

replace = {'St': 'Saint', 'St.': 'Saint', 'st': 'Saint', 'st.': 'Saint'}
replace = dict((re.escape(k), v) for k, v in replace.iteritems())
pattern = re.compile("|".join(replace.keys()))
for text in string.split():
    text = pattern.sub(lambda m: replace[re.escape(m.group(0))], text)

This should work I guess, please check. Source

edited May 23 '17 at 12:24

Community

1
1

answered Aug 28 '16 at 16:11

Jeril

6,538
3
47
63

This doesn't work, it doesn't replace anything, it just seems to remove the first word from `string` – Dartmouth Aug 28 '16 at 16:23

how to find match first word in string with regex in python

5 Answers5