0

I want to remove control characters (like ^C, ^A, and so on) from a standard input and print it to standard output, using just basic bash, perl and some other linux tools.

What I do right now is

(something) | sed 's/[[:cntrl:]]//g' | (something else)

Which worked until now, but now I found out it removes tabulators too and I want to keep those.

So, is there something else, just working?

Karel Bílek
  • 34,538
  • 28
  • 89
  • 139

3 Answers3

3

Modyfying second answer from Skip/remove non-ascii character with sed , I got this working sed script

sed 's/[^[:print:]\t]//'g

It seems to work (altough the "non-ascii" part is wrong, it does not remove any unicode).

For Unicode to work, you have to have the environment variables set up as LANG=en_US.UTF-8 and LC_CTYPE="en_US.UTF-8" (and exported).

Community
  • 1
  • 1
Karel Bílek
  • 34,538
  • 28
  • 89
  • 139
  • 1
    it certainly does remove non-printable unicode, but will only expect unicode input if your local environment variables are set appropriately (e.g. `LANG=en_US.UTF-8`) – ysth Apr 17 '13 at 02:11
1

You could just define the character class yourself based on the definition of [:cntrl:]:

sed 's/[\x00\-\x08\x10-\x1F\x7F]\{1,\}//g'
Borodin
  • 125,056
  • 9
  • 69
  • 143
Tim Pote
  • 25,751
  • 6
  • 60
  • 64
1

You can try ssed(super-sed) with perl-regex:

echo -e 'hello\tworld' | ssed 's/(?!\t)[[:cntrl:]]//g'
kev
  • 146,428
  • 41
  • 264
  • 265