4

I have a data that looks like this:

AB208804_1 446 576 AB208804_1orf 0
AB208804_20 446 576 AB208804_20orf 0

I want to convert them into this:

AB208804 446 576 AB208804orf 0
AB208804 446 576 AB208804orf 0

just by removing _\digit part in column 1 and 4.

Why this line doesn't work:

sed 's/_\d+//g'

What's the correct way to do it (one-liner)?

Willi Mentzel
  • 24,988
  • 16
  • 102
  • 110
neversaint
  • 55,647
  • 127
  • 291
  • 457
  • 1
    I have no idea why this doesn't work, but if you replace `\d` with `[0-9]` it works fine. – jtbandes Aug 06 '10 at 05:08
  • 5
    In GNU `sed`, `\d` introduces a decimal character code of one to three digits in the range 0-255. For example, to remove a tab you could do: `sed 's/\d9//'` (or `09` or `009`) or replace some unprintable characters with spaces: `sed 's/[\d1-\d31]/ /g'` – Dennis Williamson Aug 06 '10 at 06:07

3 Answers3

7

You need the -r switch and a character class for the sed.

$ echo "AB208804_1 446 576 AB208804_1orf 0" | sed -r 's/_[0-9]+//g'
AB208804 446 576 AB208804orf 0

Or, since you asked; in perl:

$ echo "AB208804_1 446 576 AB208804_1orf 0" | perl -ne 's/_\d+//g; print $_'
AB208804 446 576 AB208804orf 0
zen
  • 11,923
  • 4
  • 23
  • 16
2

Try:

sed 's/_[0-9]\+//g' 
codaddict
  • 429,241
  • 80
  • 483
  • 523
1
 sed 's/_[0-9][0-9]*//g' file
ghostdog74
  • 307,646
  • 55
  • 250
  • 337