0

I'm trying to extract/match data from a string using regular expression but I don't seem to get it.

I wan't to extract from the following string the i386 (The text between the last - and .iso):

/xubuntu/daily/current/lucid-alternate-i386.iso

This should also work in case of:

/xubuntu/daily/current/lucid-alternate-amd64.iso

And the result should be either i386 or amd64 given the case.

Thanks a lot for your help.

badp
  • 11,236
  • 3
  • 57
  • 86
user175259
  • 4,231
  • 5
  • 19
  • 14

7 Answers7

3

You could also use split in this case (instead of regex):

>>> str = "/xubuntu/daily/current/lucid-alternate-i386.iso"
>>> str.split(".iso")[0].split("-")[-1]
'i386'

split gives you a list of elements on which your string got 'split'. Then using Python's slicing syntax you can get to the appropriate parts.

Community
  • 1
  • 1
ChristopheD
  • 106,997
  • 27
  • 158
  • 177
1
r"/([^-]*)\.iso/"

The bit you want will be in the first capture group.

Amber
  • 477,764
  • 81
  • 611
  • 541
  • Were you trying to use `match()` or `search()`? Since this is a partial-match pattern, it should be used with `search()` not `match()` (since `match()` attempts to match the entire string, not just a portion). – Amber May 27 '10 at 22:30
1

First off, let's make our life simpler and only get the file name.

>>> os.path.split("/xubuntu/daily/current/lucid-alternate-i386.iso")
('/xubuntu/daily/current', 'lucid-alternate-i386.iso')

Now it's just a matter of catching all the letters between the last dash and the '.iso'.

badp
  • 11,236
  • 3
  • 57
  • 86
1

If you will be matching several of these lines using re.compile() and saving the resulting regular expression object for reuse is more efficient.

s1 = "/xubuntu/daily/current/lucid-alternate-i386.iso"
s2 = "/xubuntu/daily/current/lucid-alternate-amd64.iso"

pattern = re.compile(r'^.+-(.+)\..+$')

m = pattern.match(s1)
m.group(1)
'i386'

m = pattern.match(s2)
m.group(1)
'amd64'
Peter McG
  • 18,477
  • 8
  • 44
  • 53
0

The expression should be without the leading trailing slashes.

import re

line = '/xubuntu/daily/current/lucid-alternate-i386.iso'
rex = re.compile(r"([^-]*)\.iso")
m = rex.search(line)
print m.group(1)

Yields 'i386'

koblas
  • 23,367
  • 6
  • 37
  • 46
0
reobj = re.compile(r"(\w+)\.iso$")
match = reobj.search(subject)
if match:
    result = match.group(1)
else:
    result = ""

Subject contains the filename and path.

Turtle
  • 1,310
  • 10
  • 11
0
>>> import os
>>> path = "/xubuntu/daily/current/lucid-alternate-i386.iso"
>>> file, ext = os.path.splitext(os.path.split(path)[1])
>>> processor = file[file.rfind("-") + 1:]
>>> processor
'i386'
manifest
  • 2,136
  • 15
  • 13