2

I have been trying to extract part of string in bash. I'm using it on Mac.

Pattern of input string:

  • Some random word follow by a /. This is optional.
  • Keyword (def, foo, and bar) followed by hyphen(-) followed by numbers. This can be 2-6 digit numbers
  • These numbers are followed by hyphens again and few hyphen separated words.

Sample inputs and outputs:

abc/def-1234-random-words // def-1234
bla/foo-12-random-words // foo-12
bar-12345-random-words // bar-12345

So I tried following command to fetch it but for some weird reason, it returns entire string.

extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-[^-]*\).*/\1/g'`
// and
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-\d{2,6}\).*/\1/g'`

I also tried to make it case-insensitive using I flag but it threw error for me:

: bad flag in substitute command: 'I'


Following are the references I tried:

oguz ismail
  • 39,105
  • 12
  • 41
  • 62
Rajesh
  • 22,581
  • 5
  • 41
  • 70
  • 1
    `sed` doesn't support `\d` for digits, you can use `[0-9]` – Barmar Oct 06 '21 at 15:19
  • @Barmar i noticed some weird behaviour around `\d`. Hence i moved to `[^-]*`. It used to match it but always returned entire string. But I'll read more about it – Rajesh Oct 06 '21 at 15:22

2 Answers2

2

This gnu sed should work with ignore case flag:

sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file

def-1234
foo-12
bar-12345

This sed matches:

  • (.*/){0,1}: Match a string upto / optionally at the start
  • (: Start capture group #2
    • (def|foo|bar): Match def or foo or bar
    • -: Match a -
    • [0-9]{2,6}: Match 2 to 6 digits
  • ): End capture group #2
  • -.*: Match - followed by anything till end
  • Substitution is value we capture in group #2

Or you may use this awk:

awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file

def-1234
foo-12
bar-12345

Awk explanation:

  • -v IGNORECASE=1: Enable ignore case matching
  • -F /: Use / as field separator
  • match($NF, /^(def|foo|bar)-[0-9]{2,6}-/): Match text using regex ^(def|foo|bar)-[0-9]{2,6}- in $NF which is last field using / as field separator (to ignore text before /)
  • If match is successful then using substr print text from position 1 to RLENGTH-1 (since we matching until - after digits)
anubhava
  • 713,503
  • 59
  • 514
  • 593
  • Could you please also add explanation? What $NF means and is this case sensitive? – Rajesh Oct 06 '21 at 15:20
  • 2
    I am going to add. Meanwhile check `sed` which will do ignore case matchig – anubhava Oct 06 '21 at 15:23
  • Weird thing is, sed approach is still throwing this error: **: bad flag in substitute command: 'I'**. Is it environment specific? I'm using ZSH over Mac terminal – Rajesh Oct 06 '21 at 16:01
  • 1
    Yes as I mentioned that requires gnu sed. `sed` on Mac is BSD and that doesn't support `/I`. I am also on Mac but have gnu sed installed using `home brew` – anubhava Oct 06 '21 at 16:03
2

You can use the -E option to use extended regular expressions, then you don't have to escape ( and |.

echo abc/def-1234-random-words  | sed -E -e 's/.*((def|bar|foo)-[^-]*).*/\1/g'
def-1234
Barmar
  • 669,327
  • 51
  • 454
  • 560