0

I am struggling implementing this sed command in my pipeline. I have the following string:

>A0A7I8LN48|A0A7I8LN48_SPIIN Hypothetical protein OS=Spirodela intermedia OX=51605 GN=SI8410_18021754 PE=4 SV=1

I would like to remove everything from the "|" character the beginning of the first "_" following the pipe character. So the output should look liked this:

>A0A7I8LN48_SPIIN Hypothetical protein OS=Spirodela intermedia OX=51605 GN=SI8410_18021754 PE=4 SV=1

I tried the following:

sed 's/|.*_/_/'

which removed too much, returning me: >A0A7I8LN48_18021754 PE=4 SV=1 Then I tried to limit the pattern matching until the first "_" with .*? as follows:

sed 's/|.*?_/_/'

But this did not match anything and returned me the same input string. Can someone help me out? What am I missing here? Would be great if the solution can be provided in a sed command.

han5000
  • 89
  • 6

0 Answers0