How to remove control characters from standard input, except tabs?

Question

I want to remove control characters (like ^C, ^A, and so on) from a standard input and print it to standard output, using just basic bash, perl and some other linux tools.

What I do right now is

(something) | sed 's/[[:cntrl:]]//g' | (something else)

Which worked until now, but now I found out it removes tabulators too and I want to keep those.

So, is there something else, just working?

score 3 · Accepted Answer · edited May 23 '17 at 12:22

3

Modyfying second answer from Skip/remove non-ascii character with sed , I got this working sed script

sed 's/[^[:print:]\t]//'g

It seems to work (altough the "non-ascii" part is wrong, it does not remove any unicode).

For Unicode to work, you have to have the environment variables set up as LANG=en_US.UTF-8 and LC_CTYPE="en_US.UTF-8" (and exported).

edited May 23 '17 at 12:22

Community

1
1

answered Apr 17 '13 at 01:46

Karel Bílek

34,538
28
89
139

1

it certainly does remove non-printable unicode, but will only expect unicode input if your local environment variables are set appropriately (e.g. `LANG=en_US.UTF-8`) – ysth Apr 17 '13 at 02:11

score 1 · Answer 2 · edited Apr 17 '13 at 10:19

1

You could just define the character class yourself based on the definition of [:cntrl:]:

sed 's/[\x00\-\x08\x10-\x1F\x7F]\{1,\}//g'

edited Apr 17 '13 at 10:19

Borodin

125,056
9
69
143

answered Apr 17 '13 at 01:34

Tim Pote

25,751
6
60
64

`sed: -e expression #1, char 35: Invalid collation character` – Karel Bílek Apr 17 '13 at 01:38
`echo $'a\tb'|sed 's/[\x00\-\x08\x10\-\x19\x7F]\{1,\}//g'` - the result is `ab`. – Karel Bílek Apr 17 '13 at 02:23
I think you got mixed up between hex and decimal! What about `\x0A` through `\x0F`? – Borodin Apr 17 '13 at 02:24
Actually your link has it wrong too. It should read `[[:cntrl:]] - [\x00-\x1F\x7F]`. I guess that's what confused you. – Borodin Apr 17 '13 at 02:30
@Borodin Good catch. I didn't even really pay attention to what the other site put. See my edits. – Tim Pote Apr 17 '13 at 02:41
@TimPote: That looks better. Also it should be `[:cntrl:]` instead of `[:ctrl:]`. I've fixed that for you. – Borodin Apr 17 '13 at 10:19

score 1 · Answer 3 · answered Apr 17 '13 at 01:36

1

You can try ssed(super-sed) with perl-regex:

echo -e 'hello\tworld' | ssed 's/(?!\t)[[:cntrl:]]//g'

answered Apr 17 '13 at 01:36

kev

146,428
41
264
265

3

You can try `perl` with a perl regex `perl -pe 's/(?!\t)[[:cntrl:]]//g'`. – TLP Apr 17 '13 at 02:07

How to remove control characters from standard input, except tabs?

3 Answers3