20

Lets say I have:

a = r''' Example
This is a very annoying string
that takes up multiple lines
and h@s a// kind{s} of stupid symbols in it
ok String'''

I need a way to do a replace(or just delete) and text in between "This" and "ok" so that when I call it, a now equals:

a = "Example String"

I can't find any wildcards that seem to work. Any help is much appreciated.

cashman04
  • 1,002
  • 2
  • 12
  • 25

6 Answers6

18

You need Regular Expression:

>>> import re
>>> re.sub('\nThis.*?ok','',a, flags=re.DOTALL)
' Example String'
Kabie
  • 10,108
  • 1
  • 35
  • 44
8

Another method is to use string splits:

def replaceTextBetween(originalText, delimeterA, delimterB, replacementText):
    leadingText = originalText.split(delimeterA)[0]
    trailingText = originalText.split(delimterB)[1]

    return leadingText + delimeterA + replacementText + delimterB + trailingText

Limitations:

  • Does not check if the delimiters exist
  • Assumes that there are no duplicate delimiters
  • Assumes that delimiters are in correct order
Zachary Canann
  • 1,053
  • 2
  • 12
  • 21
4

The DOTALL flag is the key. Ordinarily, the '.' character doesn't match newlines, so you don't match across lines in a string. If you set the DOTALL flag, re will match '.*' across as many lines as it needs to.

faraday703
  • 91
  • 5
3
a=re.sub('This.*ok','',a,flags=re.DOTALL)
Vaughn Cato
  • 61,903
  • 5
  • 80
  • 122
3

Use re.sub : It replaces the text between two characters or symbols or strings with desired character or symbol or string.

format: re.sub('A?(.*?)B', P, Q, flags=re.DOTALL)
where 
A : character or symbol or string
B : character or symbol or string
P : character or symbol or string which replaces the text between A and B
Q : input string
re.DOTALL : to match across all lines
import re
re.sub('\nThis?(.*?)ok', '', a,  flags=re.DOTALL)

output : ' Example String'

Lets see an example with html code as input

input_string = '''<body> <h1>Heading</h1> <p>Paragraph</p><b>bold text</b></body>'''

Target : remove <p> tag

re.sub('<p>?(.*?)</p>', '', input_string,  flags=re.DOTALL)

output : '<body> <h1>Heading</h1> <b>bold text</b></body>'

Target : replace <p> tag with word : test

re.sub('<p>?(.*?)</p>', 'test', input_string,  flags=re.DOTALL)

otput : '<body> <h1>Heading</h1> test<b>bold text</b></body>'
Govinda
  • 519
  • 5
  • 5
0

If you want first and last words:

re.sub(r'^\s*(\w+).*?(\w+)$', r'\1 \2', a, flags=re.DOTALL)
JBernardo
  • 30,604
  • 10
  • 86
  • 109