0

I have a string like this, it has multiple spaces before 'READY' and after 'READY'

All empty space in the following examples are Space

'1df34343 43434sebb              READY                     '

How can I write a regular expression which can get '1df34343 43434sebb' as result.group(1)?

victorsc
  • 673
  • 9
  • 27
michael
  • 99,904
  • 114
  • 238
  • 340

5 Answers5

3

This captures the required group if it is followed by multiple spaces + READY. Uses positive look-ahead.

(\S+ \S+)(?=\s{2,}READY)
garyh
  • 2,712
  • 1
  • 26
  • 28
1

if you understand regular expressions you should know the following:

  • \s : whitespace characters
  • \S : non-whitespace characters
  • + : at least one of the previous capture.

script:

>>> import re
>>> s = '1df34343 43434sebb              READY                     '
>>> ms = re.match(r"(\S+ \S+)\s+(\S+)\s+", s)
>>> ms.groups()
('1df34343 43434sebb', 'READY')
>>> ms.group(1)
'1df34343 43434sebb'
>>> ms.group(2)
'READY'

you can even have a more functional regex which can be used if you ever need a more detailed parse of what you have:

>>> ms = re.match(r"((\S+) (\S+))\s+(\S+)\s+", s)
>>> ms.groups()
('1df34343 43434sebb', '1df34343', '43434sebb', 'READY')
>>> ms.group(1)
'1df34343 43434sebb'
>>> ms.group(2)
'1df34343'
>>> ms.group(3)
'43434sebb'
>>> ms.group(4)
'READY'
Inbar Rose
  • 39,034
  • 24
  • 81
  • 124
1

Here is a very simple regex that captures everything until it sees two spaces in a row:

In [11]: s = '1df34343 43434sebb              READY                     '

In [12]: re.match(r'(.*?)\s\s', s).groups()
Out[12]: ('1df34343 43434sebb',)

This captures your requirements as I've understood them. If something is amiss, please clarify.

NPE
  • 464,258
  • 100
  • 912
  • 987
0

Match anything before a multi-space group:

 re.compile(r'^(.*?)(?:\s{2,})')

outputs:

>>> import re
>>> multispace = re.compile(r'^(.*?)(?:\s{2,})')
>>> multispace.match('1df34343 43434sebb              READY                     ').groups()
('1df34343 43434sebb',)
Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187
0

Why not just split your string in 2 or more spaces. You will get a list, from which you can get the first element, which is the one you need. You don't really need a complex regex for that: -

>>> s = '1df34343 43434sebb              READY                     '
>>> import re
>>> re.split(r'[ ]{2,}', s)[0]
>>> '1df34343 43434sebb'
Rohit Jain
  • 203,151
  • 43
  • 392
  • 509
  • what is the benefit of this? i don't think that this is more efficient than matching, and you lose some functionality in the long run. – Inbar Rose Nov 28 '12 at 10:47
  • @InbarRose.. Well, I think `split` fits best for the string which OP has posted. It's not that it's good or bad. Even split uses a `regex`. And also, I do love `Regex` myself. But for this particular case, it seems overkill to use build a regex to match complete string, when OP just wants the first part. – Rohit Jain Nov 28 '12 at 10:50