-1

Simple regex task: find an ID (and language) within a string.

import re

txt = '<OB02 ID="1099367" LANG="FR">'
pattern = r'\\ID="(.*?)\\"'

result = re.findall(pattern, txt)

This gives an empty list as result. Leading to the questions:

  • How to correctly encapsulate \" in python?
  • How to extract ID and LANG from txt?
Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476
VengaVenga
  • 408
  • 6
  • 11

1 Answers1

0

Use an xmlParser to parse xml and not regex.

As a workaround you can use the following regex:

import re

txt = '<OB02 ID="1099367" LANG="FR">'
pattern = 'ID="([^"]*)'

result = re.findall(pattern, txt)

As said, this is a bad idea, caus if someone now starts using single quotes or add comments, this will break.

inetphantom
  • 2,267
  • 1
  • 35
  • 60
  • That's exactly my first choice, too. But didn't work for this string (IMHO no proper XML). But very welcome if you find a way xml.etree.ElementTree can handle this. – VengaVenga May 08 '20 at 09:18