1

I'm trying to parse some fields from a multi-line file, of which I'm only interested in some lines, while others I would like to skip. Here is an example of something similar to what I'm trying to do:

from pyparsing import *

string = "field1: 5\nfoo\nbar\nfield2: 42"

value1 = Word(nums)("value1")
value2 = Word(nums)("value2")
not_field2 = Regex(r"^(?!field2:).*$")

expression = "field1:" + value1 + LineEnd() + OneOrMore(not_field2)+ "field2:" + value2 + LineEnd()

tokens = expression.parseString(string)

print tokens["value1"]
print tokens["value2"]

where the Regex for a line not starting with field2: is adapted from Regular expression for a string that does not start with a sequence. However, running this example script gives a

pyparsing.ParseException: Expected Re:('^(?!field2:).*$') (at char 10), (line:2, col:1)

I would like the value2 to end up as 42, regardless of the number of lines (foo\n and bar\n in this case). How can I achieve that?

Kurt Peek
  • 43,920
  • 71
  • 247
  • 451

1 Answers1

1

The '^' and '$' characters in your Regex aren't interpreted on a line-by-line basis by pyparsing, but in the context of the whole string being parsed. So '^' will match only at the very beginning of the string and '$' only at the very end.

Instead you can do:

not_field2 = LineStart() + Regex(r"(?!field2:).*")
PaulMcG
  • 59,676
  • 15
  • 85
  • 126