-1

I would like to break up a string into those parts starting with a capital letter using regular expressions. But I get an error in Python v3.6 if I try it with the expression I thought would work.

import re

str = "abcDefgHijkLmnoPQrstUvwxYz"
pattern = "[A-Z]"
print (re.split(pattern, str)  # will provide ['abc,'efg','ijk', ...] but want the caps
pattern = r"([A-Z])"
print (re.split(pattern, str)  # will provide ['abc, 'D','efg','H', 'ijk', ...]
pattern = r"(?=[A-Z])" # thought this one would include the capitals with the lower case
print (re.split(pattern, str)  # generates an error - "requires a non-empty pattern"
# expected this to provide ['abc', 'Defg', 'Hijk', 'Lmno', ...

# also thought that str.split(pattern) would work the same was as re.split but it does not. Why isn;t .split() consistent with re.split()?

What would be the proper regex that would provide words starting with capital letters?

Jay Mosk
  • 155
  • 1
  • 2
  • 8
  • 1
    `print (re.split(pattern, str)` Each one of these statements is missing a closing parentheses. This code won't even run. Please post your real code. – John Gordon May 31 '22 at 23:57
  • for the `requires a non-empty pattern`, perhaps try with 3.7 https://bugs.python.org/issue43222 – jspcal May 31 '22 at 23:59

3 Answers3

0

Use re.findall instead of re.split:

import re

s = "abcDefgHijkLmnoPQrstUvwxYz"


print(re.findall(r"([A-Z]*[a-z]+)", s))

Prints:

['abc', 'Defg', 'Hijk', 'Lmno', 'PQrst', 'Uvwx', 'Yz']
Andrej Kesely
  • 118,151
  • 13
  • 38
  • 75
0
>>> print(re.split("(?=[A-Z])", s))
['abc', 'Defg', 'Hijk', 'Lmno', 'P', 'Qrst', 'Uvwx', 'Yz']

The one you thought would work does work in Python 3.10, except you say "split the string" and "provide words starting with capital letters" and there's no easy way to do that which gets rid of the "abc" at the start which doesn't have a capital. You need re.findall() to pick out the capital-lower words:

>>> print(re.findall(r"([A-Z][a-z]+)", s))
['Defg', 'Hijk', 'Lmno', 'Qrst', 'Uvwx', 'Yz']

NB. I renamed the string s because str() is a Python builtin and it's bad form to overwrite those with your own code.

TessellatingHeckler
  • 24,312
  • 4
  • 40
  • 77
0

To split the string by camel cases use

\w+?(?=[A-Z])

Python example

import re

s = "abcDefgHijkLmnoPQrstUvwxYz"

re.findall(r"\w+?(?=[A-Z])", s)  # ['abc', 'Defg', 'Hijk', 'Lmno', 'P', 'Qrst', 'Uvwx']
Artyom Vancyan
  • 2,348
  • 3
  • 10
  • 28