-1

I got some text extracted and wish to clean it up by RegEx.

I have learned basic RegEx but not sure how to build this one:

str = '''
this is 
a line that has been cut.
This is a line that should start on a new line
'''

should be converted to this:

str = '''
this is a line that has been cut.
This is a line that should start on a new line
'''

This r'\w\n\w' seems to catch it, but not sure how to replace the new line with space and not touch the end and beginning of words

Norfeldt
  • 6,180
  • 16
  • 85
  • 131

1 Answers1

5

You can use this lookbehind regex for re.sub:

>>> str = '''
... this is
... a line that has been cut.
... This is a line that should start on a new line
... '''
>>> print re.sub(r'(?<!\.)\n', '', str)
this is a line that has been cut.
This is a line that should start on a new line
>>>

RegEx Demo

(?<!\.)\n matches all line breaks that are not preceded by a dot.

If you don't want a match based on presence of dot then use:

re.sub(r'(?<=\w\s)\n', '', str)

RegEx Demo 2

anubhava
  • 713,503
  • 59
  • 514
  • 593
  • hmm.. can't make it work for the case I have https://repl.it/@Norfeldt/SuperficialCumbersomeTenrec – Norfeldt Dec 05 '17 at 11:56
  • That link doesn't even show what is original string. Also note I suggested `r'(? – anubhava Dec 05 '17 at 11:59
  • 1
    I know.. sorry.. I thought that my example would cover my use case. It didn't. I had to add `\w`, else it would add some weird places.. – Norfeldt Dec 05 '17 at 12:03