-2

I've got a document that looks something like this:

# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)

In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.

I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.

I've read over answers such as

  1. Regular Expressions: Is there an AND operator?

  2. Regex: Find a character anywhere in a document but only on lines that begin with a specific word

    and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*

So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?

Community
  • 1
  • 1
Grayda
  • 1,641
  • 2
  • 22
  • 41

2 Answers2

1

Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times. This would for example also match t test

What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)

Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.

(?:^#.*$|(\d+))

Details

  • (?: Non capturing group
    • ^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
    • | Or
    • (\d+) capture one or more digits in a group
  • ) Close non capturing group
The fourth bird
  • 127,136
  • 16
  • 45
  • 63
0

I think a way simpler method would be to replace the lines with "" first with this regex:

^#.*

And then you can just match all the numbers with this:

-?\d+ (-? is for negative)
Sweeper
  • 176,635
  • 17
  • 154
  • 256
  • My code eventually used two lines as you suggested (though slightly different, using `^[^#]+` to catch all non-hashed lines then simply `\d+` for the digits), though I was interested to see if there was a "one liner", which there was via @the fourth bird's answer. – Grayda May 06 '18 at 12:37