re.sub replace with matched content

Question

Trying to get to grips with regular expressions in Python, I'm trying to output some HTML highlighted in part of a URL. My input is

images/:id/size

my output should be

images/<span>:id</span>/size

If I do this in Javascript

method = 'images/:id/size';
method = method.replace(/\:([a-z]+)/, '<span>$1</span>')
alert(method)

I get the desired result, but if I do this in Python

>>> method = 'images/:id/huge'
>>> re.sub('\:([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'

I don't, how do I get Python to return the correct result rather than $1? Is re.sub even the right function to do this?

score 122 · Accepted Answer · edited May 24 '19 at 10:10

122

Simply use \1 instead of $1:

In [1]: import re

In [2]: method = 'images/:id/huge'

In [3]: re.sub(r'(:[a-z]+)', r'<span>\1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'

Also note the use of raw strings (r'...') for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.

edited May 24 '19 at 10:10

kubanczyk

4,185
33
50

answered Aug 25 '11 at 13:32

NPE

464,258
100
912
987

11

For those looking for this example and wondering why it fails on your tests, make sure to add the r (character 'r') before the group string – Marcello Grechi Lins Jul 10 '15 at 17:00
4

The `r` specifier was the issue this answer helped me with as well. – kungphu Jan 29 '16 at 10:46
2

`\g<0>` works when there is no matching group, i.e. for a non-grouping regex like `':[a-z]+'`. Straight from https://docs.python.org/3/library/re.html#re.sub – ccpizza Nov 19 '17 at 15:46
is there a way to modify what's in \1 before the substitution? – gary69 Feb 09 '19 at 16:23

score 17 · Answer 2 · answered Aug 25 '11 at 13:31

17

Use \1 instead of $1.

\number Matches the contents of the group of the same number.

http://docs.python.org/library/re.html#regular-expression-syntax

answered Aug 25 '11 at 13:31

Wiktor Stribiżew · Answer 3 · 2022-01-28T15:29:12.367

14

A backreference to the whole match value is \g<0>, see re.sub documentation:

The backreference \g<0> substitutes in the entire substring matched by the RE.

See the Python demo:

import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>\g<0></span>', method))
# => images/<span>:id</span>/huge

If you need to perform a case insensitive search, add flag=re.I:

re.sub(r':[a-z]+', r'<span>\g<0></span>', method, flags=re.I)

edited Jan 28 '22 at 15:29

answered Jan 17 '19 at 11:47

Wiktor Stribiżew

561,645
34
376
476

1

`\g<1>` etc are also valid, providing a way to replace with `\11` (\1 and the number 1) as opposed to capture group 11. – Orwellophile Jun 26 '19 at 16:02
@Orwellophile Yes, this syntax allows to use all the backreferences, not just to Group 0. – Wiktor Stribiżew Jun 26 '19 at 18:13

score 5 · Answer 4 · answered Aug 25 '11 at 13:35

For the replacement portion, Python uses \1 the way sed and vi do, not $1 the way Perl, Java, and Javascript (amongst others) do. Furthermore, because \1 interpolates in regular strings as the character U+0001, you need to use a raw string or \escape it.

Python 3.2 (r32:88445, Jul 27 2011, 13:41:33) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>\1</span>', method)
'images/<span>id</span>/huge'
>>>

re.sub replace with matched content

4 Answers4

Linked

Related