I am struggling implementing this sed command in my pipeline. I have the following string:
>A0A7I8LN48|A0A7I8LN48_SPIIN Hypothetical protein OS=Spirodela intermedia OX=51605 GN=SI8410_18021754 PE=4 SV=1
I would like to remove everything from the "|" character the beginning of the first "_" following the pipe character. So the output should look liked this:
>A0A7I8LN48_SPIIN Hypothetical protein OS=Spirodela intermedia OX=51605 GN=SI8410_18021754 PE=4 SV=1
I tried the following:
sed 's/|.*_/_/'
which removed too much, returning me: >A0A7I8LN48_18021754 PE=4 SV=1
Then I tried to limit the pattern matching until the first "_" with .*? as follows:
sed 's/|.*?_/_/'
But this did not match anything and returned me the same input string. Can someone help me out? What am I missing here? Would be great if the solution can be provided in a sed command.