How to match whitespace in sed?

Question

How can I match whitespace in sed? In my data I want to match all of 3+ subsequent whitespace characters (tab space) and replace them by 2 spaces. How can this be done?

mrucci · Accepted Answer · 2013-09-11T08:33:00.023

339

The character class \s will match the whitespace characters <tab> and <space>.

For example:

$ sed -e "s/\s\{3,\}/  /g" inputFile

will substitute every sequence of at least 3 whitespaces with two spaces.

REMARK: For POSIX compliance, use the character class [[:space:]] instead of \s, since the latter is a GNU sed extension. See the POSIX specifications for sed and BREs

edited Sep 11 '13 at 08:33

answered Feb 24 '10 at 12:08

mrucci

9,998

9

aha! It was the missing -e switch that got me. – Stop Slandering Monica Cellio Sep 12 '11 at 14:44
32

I also had to add '-r' switch which enables extended regex's to make sed recognize '\s' as space. – HUB May 16 '12 at 15:12
71

With Apple's sed I had to use [[:space:]] because \s did not work for me. Perhaps \s is a GNU sed extension? – Jared Beck Jun 17 '13 at 23:24
3

@JaredBeck thanks, was running out of ideas why my simple regex wasnt working.. This is lame, I thought \s was standard extended regex.. Also -r doesnt work and -E did squat – Karthik T Sep 11 '13 at 04:58
2

Thanks for the feedback. I updated the answer with links to the POSIX standard. – mrucci Sep 11 '13 at 08:32
2

For me -e stopped it working, but -r made it work (Mint 16). I.e. changing from sed -e -r to sed -r was what I needed to do. However I was using [[:space:]] by this point, as I couldn't get \s to work. – Darren Cook Aug 16 '14 at 17:58
1

Much like the POSIX [:space:] character class, \s will not only match <tab> and <space>, but also the <newline> character (try sed 'N;s/\s/x/' <<<$'aaa\nbbb' in bash). – Witiko Sep 11 '16 at 18:08
1

GNU sed manual does not list \s as a GNU extension. – jarno Nov 18 '16 at 06:16
15

Instead of [[:space:] one could use [[:blank:]] which does match horizontal tabs and spaces only (but no newlines, vertical tabs etc.). – stefanct Oct 13 '17 at 13:10
1

On my platforms -e is optional – NeilG Aug 11 '19 at 05:43
But how do you specify \s in the destination part (i.e. the replace-with) part of the regular expression? I want to avoid using keyboard spaces and/or tabs there, as well. – NYCeyes Jul 09 '21 at 20:40

score 113 · Answer 2 · answered Aug 28 '13 at 20:28

113

This works on MacOS 10.8:

sed -E "s/[[:space:]]+/ /g"

answered Aug 28 '13 at 20:28

some ideas

1,278

3

do you know if this works on all Linux distros ? – amphibient Feb 06 '14 at 17:26
3

Not generally, GNU sed won't have -E. From the BSD sed man page: "The -E, -a and -i options are non-standard FreeBSD extensions and may not be available on other operating systems." – Brad Koch Mar 18 '14 at 21:19
1

Why do you need the -E flag, for the + operator? Most expressions would probably be fine with * instead, then this would work on other platforms. – Samuel Mar 21 '15 at 00:05
7

@Samuel If you use *, the regex will match zero or more spaces, and you will get a space between every character, and a space at each end of each line. If you don't have the -E flag, then you want sed "s/[[:space:]]\+/ /g" to match one or more spaces. – jbo5112 Jan 20 '16 at 20:49
1

FWIW, NetBSD's sed supports the -E flag as well. – mcandre Dec 29 '17 at 21:53
@BradKoch The fact that -E is non-standard does not imply GNU sed does not have that option. You linked document exactly states the availability of -E option for GNU sed as well. – xuhdev Mar 06 '18 at 22:08
@xuhdev You're correct, GNU sed added support for -E in version 4.3, released in 2017. Older versions will still fail with -E. – Brad Koch Mar 06 '18 at 22:18
@BradKoch OK, I think I know what is confusing. Older versions already support -E but it is not documented. It was documented later since it seems that -E is coming to POSIX standard. See https://unix.stackexchange.com/a/310454/38242 – xuhdev Mar 06 '18 at 22:25
1

For curious readers: GNU sed has had -r since as long as I can remember (prior to 2004 switch to git). -E was added as an undocumented alias to -r in Aug 2006 (rev 3a8e165). They documented -E in Oct 2013 (rev 8b65e079, prior to v4.1; they didn't git tag prior releases). All v4.3 added w/re to -E was examples in the HTML documentation. Regardless, any GNU sed running in 2010 shouldn't have had any problems with -E, but it was undocumented at the time... git://git.sv.gnu.org/sed – bobpaul Mar 01 '19 at 19:17

score 16 · Answer 3 · answered Jul 22 '14 at 14:52

16

sed 's/[ \t]*/"space or tab"/'

answered Jul 22 '14 at 14:52

Zac

427

3

Is this guaranteed to work on any version of sed on any system? If not it might be worth mentioning where this does work in a similar fashion as the other answers, just so we know the limitations and where this might not have the intended result. – Mokubai Jul 22 '14 at 20:34
3

This RE is what I use to match whitespace. It is simpler than character classes just to match tab or space. It uses only the most basic conventions of regular expressions, so it should work anywhere with a functional implementation of regular expressions. – Nate Oct 18 '14 at 04:50
4

On Mac 10.9.5 this matches for spaces and 't'. I used Michael Douma's above to match whitespace chars (it also works with -e). – Alien Life Form Jul 31 '15 at 18:32
1

Doesn't work sensibly on my SUSE system. It matches the first place on the line where there is zero or more spaces, which is before the first character. I doubt that is the intended function, and certainly wasn't the requested use case. I believe you want to change the '*' for '+' (or '{3,}' per the question) and maybe put a g at the end of the sed command to match all occurrences of the pattern. Replacing [ \t] with [[:space:]] may also be desirable as well, in case there is something else for whitespace in the line. – jbo5112 Jan 20 '16 at 20:59
1

doesn't work on my macos Catalina – Jerry Green Nov 05 '20 at 19:02

score 15 · Answer 4 · answered Apr 07 '10 at 15:12

15

Some older versions of sed may not recognize \s as a white space matching token. In that case you can match a sequence of one or more spaces and tabs with '[XZ][XZ]*' where X is a space and Z is a tab.

answered Apr 07 '10 at 15:12

Marnix A. van Ammers

2,144

1

So for the particular need here, with an older sed, you could do:
$ sed 's/[XZ][XZ][XZ][XZ]*/ /g' inputfile

where X is a tab and Z is a space.
– Marnix A. van Ammers Apr 12 '10 at 15:08

score 0 · Answer 5 · answered Oct 19 '20 at 15:59

0

None of the above worked for me. Yet I found the simplest answer ever by using awk

user@~[]$ cat /tmp/file
/nospace/in/here
/this/one space
/well/seems we have spaces
user@~[]$ cat /tmp/file |awk 'NF>1'
/this/one space
/well/seems we have spaces
user@~[]$

answered Oct 19 '20 at 15:59

user1932365

1

The OP asked for a solution for sed, not awk. – PLG Dec 17 '22 at 19:16

score 0 · Answer 6 · answered Oct 06 '22 at 17:13

0

I don't know if it can help but I just did that :

MacBook-Pro-van-User:training user$ cat sed.txt

My name is Bob

MacBook-Pro-van-User:training user$ sed s/"My name is Bob"/"My Lastname is Montoya"/g sed.txt

My Lastname is Montoya

I just added "" in the command.

answered Oct 06 '22 at 17:13

Bünyamin C.

1

1

Welcome to Super User! Before answering an old question having an accepted answer (look for green ✓) as well as other answers ensure your answer adds something new or is otherwise helpful in relation to them. Here is a guide on [answer]. There is also a site [tour] and a [help]. – help-info.de Oct 06 '22 at 17:15

How to match whitespace in sed?

6 Answers6