3

I'm trying to find all lines that are all caps using regex, and so far I've tried this:

re.findall(r'\b\n|[A-Z]+\b', kaizoku)

So far my database is as follows:

TRAFALGAR LAW
You shall not be the pirate king.
MONKEY D LUFFY
Now!
DOFLAMINGO'S UNDERLINGS:
Noooooo!

I want it to return

TRAFALGAR LAW
MONKEY D LUFFY
DOFLAMINGO'S UNDERLINGS:

But it's returning something else. (Namely this:

TRAFALGAR
LAW
Y
MONKEY
D
LUFFY
N
DOFLAMINGO'
S
UNDERLINGS:
N

EDIT So far I really think the best fit for the answer is @Jan's answer

rx = re.compile(r"^([A-Z ':]+$)\b", re.M)
rx.findall(string)

EDIT2 Found out what's wrong, thanks!

3 Answers3

5

Brief

No need for regex, python has the method isupper()

Return true if all cased characters[4] in the string are uppercase and there is at least one cased character, false otherwise.

[4] Cased characters are those with general category property being one of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, titlecase).


Code

See code in use here

a = [
    "TRAFALGAR LAW",
    "You shall not be the pirate king.",
    "MONKEY D LUFFY",
    "Now!",
    "DOFLAMINGO'S UNDERLINGS:",
    "Noooooo!",
]

for s in a:
    print s.isupper()

Result

True
False
True
False
True
False
ctwheels
  • 20,701
  • 7
  • 36
  • 71
4

Here you go

import re

string = """TRAFALGAR LAW
You shall not be the pirate king.
MONKEY D LUFFY
Now!
DOFLAMINGO'S UNDERLINGS:
Noooooo!
"""

rx = re.compile(r"^([A-Z ':]+$)", re.M)

UPPERCASE = [line for line in string.split("\n") if rx.match(line)]
print(UPPERCASE)

Or:

rx = re.compile(r"^([A-Z ':]+$)", re.M)

UPPERCASE = rx.findall(string)
print(UPPERCASE)

Both will yield

['TRAFALGAR LAW', 'MONKEY D LUFFY', "DOFLAMINGO'S UNDERLINGS:"]
Jan
  • 40,932
  • 8
  • 45
  • 77
2

You can use [A-Z\W] to check for any uppercase letters along with non alphanumeric characters:

import re
s = ["TRAFALGAR LAW", "You shall not be the pirate king.", "MONKEY D LUFFY", "Now!", "DOFLAMINGO'S UNDERLINGS:", "Noooooo!"]
new_s = [i for i in s if re.findall('^[A-Z\d_\W]+$', i)]

Output:

['TRAFALGAR LAW', 'MONKEY D LUFFY', "DOFLAMINGO'S UNDERLINGS:"]
Ajax1234
  • 66,333
  • 7
  • 57
  • 95
  • Wouldn't `[A-Z\d_\W]` be better as it includes digits and underscore (in the case that they may be used)? – ctwheels Dec 06 '17 at 21:57