0

I am currently try to locate people by their tweets.

I decide to do it by counting word frequency.

But there is some words like 'WeWouldwin' or 'AtGEO' which I want to separate them and count them individually.

I wonder is there a pythonic way separate them according to the uppercase letter?

So I would have those two words be separate like 'We', 'Would', 'Win' and 'At', 'GEO'.

I have tried the method from the following link:

Split a string at uppercase letters

But this will give the individual uppercase letter(for example, 'G','E','O' instead of 'GEO')

umn
  • 393
  • 3
  • 17
Rui
  • 29
  • 1
  • 8

1 Answers1

-1

You can use this script

word = 'WeWouldWin'
start = 0
array = []
for pos, char in enumerate(word):
    if char.isupper() and pos !=0 and word[pos-1].islower():
        array.append(word[start:pos])
        start = pos
array.append(word[start:len(word)])
print(array)

Regards.

Francisco Gonzalez
  • 384
  • 1
  • 2
  • 13
  • I would add a '+' after the first uppercase letter to match on words with all capital letters but then those words should be at the end of the string because in other cases it will create bad results. – Gábor Fekete May 21 '19 at 08:37
  • This doesn't solve the "GEO" case at all. – tripleee May 21 '19 at 08:37