Removing parts of a string that contain digit with SED/Perl

Question

I have a data that looks like this:

AB208804_1 446 576 AB208804_1orf 0
AB208804_20 446 576 AB208804_20orf 0

I want to convert them into this:

AB208804 446 576 AB208804orf 0
AB208804 446 576 AB208804orf 0

just by removing _\digit part in column 1 and 4.

Why this line doesn't work:

sed 's/_\d+//g'

What's the correct way to do it (one-liner)?

I have no idea why this doesn't work, but if you replace `\d` with `[0-9]` it works fine. — jtbandes, Aug 06 '10 at 05:08
In GNU `sed`, `\d` introduces a decimal character code of one to three digits in the range 0-255. For example, to remove a tab you could do: `sed 's/\d9//'` (or `09` or `009`) or replace some unprintable characters with spaces: `sed 's/[\d1-\d31]/ /g'` — Dennis Williamson, Aug 06 '10 at 06:07

score 7 · Accepted Answer · answered Aug 06 '10 at 05:12

7

You need the -r switch and a character class for the sed.

$ echo "AB208804_1 446 576 AB208804_1orf 0" | sed -r 's/_[0-9]+//g'
AB208804 446 576 AB208804orf 0

Or, since you asked; in perl:

$ echo "AB208804_1 446 576 AB208804_1orf 0" | perl -ne 's/_\d+//g; print $_'
AB208804 446 576 AB208804orf 0

answered Aug 06 '10 at 05:12

zen

score 2 · Answer 2 · answered Aug 06 '10 at 05:10

2

Try:

sed 's/_[0-9]\+//g'

answered Aug 06 '10 at 05:10

codaddict

score 1 · Answer 3 · answered Aug 06 '10 at 05:34

1

 sed 's/_[0-9][0-9]*//g' file

answered Aug 06 '10 at 05:34

ghostdog74

3 Answers3