Python - regex - how to find ONLY four letter words?

Question

I am working with a string of text that I want to search through and only find 4 letters words. It works, except it also finds 4+ letter words as well.

import re
test ="hello, how are you doing tonight?"
total = len(re.findall(r'[a-zA-Z]{3}', text))
print (total)

It finds 15, although I am not sure how it found that many. I thought I might have to use \b to pick the beginning and the end of the word, but that didn't seem to work for me.

show what you tried with `\b`. – Bryan Oakley Feb 19 '18 at 22:09 — Bryan Oakley, Feb 19 '18 at 22:09

score 12 · Accepted Answer · edited Feb 19 '18 at 22:19

12

Try this

re.findall(r'\b\w{4}\b',text)

The regex matches:

\b, which is a word boundary. It matches the beginning or end of a word.

\w{4} matches four word characters (a-z, A-Z, 0-9 or _).

\b is yet another word boundary.

**As a side note, your code contains typos, the second parameter of the re.findall should be the name of your string variable, which is test. Also, your string does not contain any 4 letter words so the suggested code will give the output of 0.

edited Feb 19 '18 at 22:19

Wiktor Stribiżew

561,645
34
376
476

answered Feb 19 '18 at 22:07

diypcjourney

189
1
6

Yes thank you. I did notice that after. I have made the correction and added in the \b as well. Great ! – netrate Feb 19 '18 at 23:44

score 0 · Answer 2 · answered Feb 19 '18 at 22:09

0

Here's a way without regex:

from string import punctuation

s = "hello, how are you doing tonight?"

[i for i in s.translate(str.maketrans('', '', punctuation)).split(' ') if len(i) > 4]

# ['hello', 'doing', 'tonight']

answered Feb 19 '18 at 22:09

jpp

147,904
31
244
302

score 0 · Answer 3 · answered Feb 19 '18 at 22:10

0

You can use re.findall to locate all letters, and then filter based off of length:

import re
test ="hello, how are you doing tonight?"
final_words = list(filter(lambda x:len(x) == 4, re.findall('[a-zA-Z]+', test)))

answered Feb 19 '18 at 22:10

Ajax1234

66,333
7
57
95

Python - regex - how to find ONLY four letter words?

3 Answers3