1

I am trying to write a regex expression which matches to proper capitalized nouns, s/a "Oreo," "Snickers Bar," "McFlurry".

import re

text = "George Washington, known as the \"Father of His Country,\" was an American soldier and statesman who served from 1789 to 1797 as the first President of the United States. He was commander-in-chief of the Continental Army during the American Revolutionary War and presided over the 1787 Constitutional Convention. As one of the leading Patriots, he was among the nation's Founding Fathers. Yankee Hotel Foxtrot Yankee Hotel Foxtrot."
reg = "[A-Z]\w+(\s*[A-Z]\w+)*"

re.findall(reg, text)

gives me the output

[' Washington', '', ' Country', '', '', ' States', '', ' Army', ' War', ' Convention', '', '', ' Fathers', ' Foxtrot']

Which is obviously the kinds of matches that I'm looking for, minus the first word. Any idea why my regex search seems to be validating the [A-Z]\w+ at the beginning but not yielding it as part of the result?

edit: I should add that this expression works as indended on regex-testing sites like pythex.org, but works as stated above in my Google Colab notebook.

ShadowRanger
  • 124,179
  • 11
  • 158
  • 228
  • Use `reg = "[A-Z]\w+(?:\s*[A-Z]\w+)*"` – Wiktor Stribiżew Mar 25 '19 at 16:22
  • @WiktorStribiżew Interesting, none of the python regex guides that I saw mentioned that `?:` would be necessary before defining a capture group in this way. Wonder why. Works well on my end, thanks for the help. – David Mitchell Mar 25 '19 at 16:26
  • @DavidMitchell: It's not necessary in general, but part of [`re.findall`'s specific behavior](https://docs.python.org/3/library/re.html#re.findall) is that having *any* capture groups changes the behavior: "If one or more [capture] groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group." . Using `?:` means you *don't* have capture groups, so the default behavior is restored. – ShadowRanger Mar 25 '19 at 16:36
  • @WiktorStribiżew ah awesome. Didn't expect the problem to lie with `findall` – David Mitchell Mar 25 '19 at 17:41

0 Answers0