1

pdfgrep works like grep except that it acts on pages instead of lines. How can I craft a regular expression with a newline character?

I want to look for a, followed by any number of characters except linebreaks, followed by b, but pdfgrep 'a[^\n]*b' doesn't work, whereas pdfgrep 'a.*b' returns results that span multiple lines. (I've examined the output with xxd to confirm that these newlines are indeed \x0A.)

Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476
JellicleCat
  • 26,352
  • 22
  • 102
  • 152

1 Answers1

0

By default, pdfgrep uses a POSIX compliant regex flavor where . matches any char including line break chars.

Fortunately, pdfgrep also supports PCRE regex flavor with the help of -P flag. In a PCRE regex flavor, . matches any char but line break chars.

Thus, you can use

pdfgrep -P 'a.*b'
Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476