113

My example string is as follows:

This is 02G05 a test string 20-Jul-2012

Now from the above string I want to extract 02G05. For that I tried the following regex with sed

$ echo "This is 02G05 a test string 20-Jul-2012" | sed -n '/\d+G\d+/p'

But the above command prints nothing and the reason I believe is it is not able to match anything against the pattern I supplied to sed.

So, my question is what am I doing wrong here and how to correct it.

When I try the above string and pattern with python I get my result

>>> re.findall(r'\d+G\d+',st)
['02G05']
>>>
RanRag
  • 46,455
  • 34
  • 109
  • 162

5 Answers5

120

The pattern \d might not be supported by your sed. Try [0-9] or [[:digit:]] instead.

To only print the actual match (not the entire matching line), use a substitution.

sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'
tripleee
  • 158,107
  • 27
  • 234
  • 292
  • 6
    Thanks it worked fine. But I have a question why `.*` is necessary with your regex because when I try `sed -n 's/\([0-9]\+G[0-9]\+\)/\1/p'` it just prints the entire line. – RanRag Jul 19 '12 at 20:47
  • 7
    That's why, isn't it? Replace whatever comes before and after the match with norhing, then print the whole line. – tripleee Jul 19 '12 at 21:01
  • 1
    @tripleee This only prints `2G05` not `02G05`. The expression that works is `'s/.*\([0-9][0-9]G[0-9][0-9]*\).*/\1/p'` – Kshitiz Sharma Dec 12 '13 at 10:06
  • 1
    That hard-codes it to exactly two digits. Something like `sed -n 's/\(.*[^0-9]\)\?\([0-9][0-9]*G[0-9][0-9]*\).*/\2/p'` would be more general. (I assume your `sed` supports `\?` for zero or one occurrence.) – tripleee Dec 12 '13 at 11:53
  • See also https://stackoverflow.com/a/48898886/874188 for how to replace various other common Perl escapes like `\w`, `\s`, etc. – tripleee Aug 16 '19 at 05:28
  • @tripleee your "to only print the actual match...." was the pointer i needed for what i was trying to do – northern-bradley Mar 06 '20 at 20:29
  • @tripleee what do you want to show with `sed -n 's/\(.*[^0-9]\)\?\([0-9][0-9]*G[0-9][0-9]*\).*/\2/p'`? This is confusing, `\1` is not used. – Timo May 27 '20 at 11:53
  • Why is it confusing? I discard whatever the first group matches. The `\?` makes it optional (so it could be empty) but if there is anything before the number, we remove it. – tripleee May 27 '20 at 12:03
120

How about using grep -E?

echo "This is 02G05 a test string 20-Jul-2012" | grep -Eo '[0-9]+G[0-9]+'
mVChr
  • 48,301
  • 11
  • 105
  • 100
  • 3
    +1 This is simpler, and will also correctly handle the case of multiple matches on the same line. A complex `sed` script could be devised for that case, but why bother? – tripleee Jul 20 '12 at 07:28
  • `egrep` uses extended regexp, `sed` and `grep` uses standard regexp, `egrep` or `grep -e` or `sed -E` use extended regexp, and the python code in the question uses PCRE, (perl common regular expression) GNU grep can use PCRE with `-P` option. – Felipe Buccioni Aug 22 '16 at 13:46
  • @FelipeBuccioni actually that should be `egrep` or `grep -E` or `sed -r` – SensorSmith Apr 13 '18 at 15:44
  • For a single(first) match, append ` | head -1` (without backticks), as per [this answer](https://stackoverflow.com/a/14093511/3610458) to another question. – SensorSmith Apr 13 '18 at 15:55
  • @SensorSmith Some `sed` implementations use `-r`, others use `-E`; still others don't have an option to change the regex dialect. – tripleee Apr 20 '18 at 03:41
  • 1
    `grep` has `-m 1` to stop after the first match. – tripleee Apr 20 '18 at 03:42
  • Thanks a ton. Finally a simple and elegant solution than `grep` / `awk` / `sed` – Sunny Tambi May 29 '20 at 11:40
6

sed doesn't recognize \d, use [[:digit:]] instead. You will also need to escape the + or use the -r switch (-E on OS X).

Note that [0-9] works as well for Arabic-Hindu numerals.

Dennis Williamson
  • 324,833
  • 88
  • 366
  • 429
  • I tried `sed -n '/[0-9]\+G[0-9]\+/p'`. Now it just prints the whole string – RanRag Jul 19 '12 at 20:43
  • @Noob: You will need to use substitution to [exclude the parts you don't want to print](http://stackoverflow.com/questions/2777579/sed-group-capturing/2778096#2778096). – Dennis Williamson Jul 19 '12 at 20:46
5

Try this instead:

echo "This is 02G05 a test string 20-Jul-2012" | sed 's/.* \([0-9]\+G[0-9]\+\) .*/\1/'

But note, if there is two pattern on one line, it will prints the 2nd.

Zsolt Botykai
  • 48,485
  • 14
  • 85
  • 106
-1

Try using rextract. It will let you extract text using a regular expression and reformat it.

Example:

$ echo "This is 02G05 a test string 20-Jul-2012" | ./rextract '([\d]+G[\d]+)' '${1}'

2G05
Geoff
  • 7,646
  • 3
  • 33
  • 42