1

I am using this pattern

const string ptnBodytext = @"<p>\s*(.+?)\s*</p>";

in order to extract the text within the <p> tags. It works fine except for those text with newline, e.g.:

<p>
    Lorem ipsum
    second line or
    third one?
</p>

How can I change the pattern in order to include newline, tabs and so on?

Manfred Radlwimmer
  • 12,826
  • 13
  • 52
  • 59
Ras
  • 618
  • 1
  • 10
  • 27

2 Answers2

4

You either need to activate the dotall mode or:

const string ptnBodytext = @"<p>([\s\S]+?)</p>";

See a demo on regex101.com.

Jan
  • 40,932
  • 8
  • 45
  • 77
2

Just remove the \s*:

const string ptnBodytext = @"<p>(.+?)</p>";
Dmitry Egorov
  • 9,337
  • 3
  • 23
  • 39
  • 2
    [**Not** true](https://regex101.com/r/yG5hW3/1) without [`DOTALL`](https://regex101.com/r/yG5hW3/2) mode. Additionally, `\s*` matches *zero* or more whitespace characters. – Jan Aug 23 '16 at 11:47