-1

I'm looking to extract only what appears after '/g' and before the '+' or '?'

urls = ["https://www.google.com/es/g/Dmitry+Kharchenko?searchterm=isometrico",
       "https://www.google.com/es/g/Irina+Strelnikova?searchterm=isom%C3%A9trico",
       "https://www.google.com/es/g/ParabolStudio?searchterm=auto"]

for i in urls:
    print(re.findall(r'g/(.*)[\+|\??]', i))


['Dmitry+Kharchenko']
['Irina+Strelnikova']
['ParabolStudio']

Desired result:

'Dmitry'
'Irina'
'ParabolStudio'
Raymont
  • 291
  • 1
  • 16

1 Answers1

0

You need to use non-greedy pattern .*? which will match up to the first + or ? it encountered instead of the last + or ? in greedy case, i.e. .*; To match + or ? with character class you can just do [+?]:

for i in urls:
    print(re.findall(r'g/(.*?)[+?]', i))

# ['Dmitry']
# ['Irina']
# ['ParabolStudio']
Psidom
  • 195,464
  • 25
  • 298
  • 322