How to make the 'cut' command treat same sequental delimiters as one?

Question

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut command in the following manner:

cat text.txt | cut -d " " -f 4

Unfortunately, cut doesn't treat several spaces as one delimiter. I could have piped through awk

awk '{ printf $4; }'

or sed

sed -E "s/[[:space:]]+/ /g"

to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively?

Possible duplicate of [linux cut help - how to specify more spaces for the delimiter?](http://stackoverflow.com/questions/7142735/linux-cut-help-how-to-specify-more-spaces-for-the-delimiter) — Inanc Gumus, Jan 13 '17 at 18:53
I love `awk` BUT when you are doing `kubectl ... bash -c 'awk ...'` and similar, things start to get funny with quotes, parameter references, etc. Then it's actually quite nice to whip out the old rudimentary tools from the toolbox. — sastorsl, Apr 29 '22 at 06:51

score 578 · Accepted Answer · edited Apr 23 '18 at 23:03

578

Try:

tr -s ' ' <text.txt | cut -d ' ' -f4

From the tr man page:

-s, --squeeze-repeats   replace each input sequence of a repeated character
                        that is listed in SET1 with a single occurrence
                        of that character

edited Apr 23 '18 at 23:03

Austin Adams

6,515
3
22
27

answered Dec 19 '10 at 16:22

kev

146,428
41
264
265

26

No need for `cat` here. You could pass `< text.txt` directly to `tr`. http://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_cat – arielf Aug 09 '14 at 20:10
1

Not sure it is any simpler, but you are going to merge, you can forgo cut's `-d` and translate straight from multiple characters to tab. For example: I came here looking for a way to automatically export my display: `who am i | tr -s ' ()' '\t' | cut -f5` – Leo Mar 28 '16 at 23:24
This doesn't remove leading/trailing whitespace (which may or may not be wanted, but usually isn't), in contrast with the awk solution. The awk solution is also much more readable and less verbose. – n.caillou Apr 04 '18 at 23:31
-1 **WARNING: THIS IS NOT THE SAME THING AS TREATING SEQUENTIAL DELIMETERS AS ONE.** Compare `echo "a b c" | cut -d " " -f2-`, `echo "a b c" | tr -s " " | cut -d " " -f2-` – user541686 Jul 21 '19 at 10:01

score 101 · Answer 2 · edited May 23 '17 at 12:26

101

As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.

Let me however go through all the possible combinations for future readers. Explanations are at the Test section.

tr | cut

tr -s ' ' < file | cut -d' ' -f4

awk

awk '{print $4}' file

bash

while read -r _ _ _ myfield _
do
   echo "forth field: $myfield"
done < file

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file

Tests

Given this file, let's test the commands:

$ cat a
this   is    line     1 more text
this      is line    2     more text
this    is line 3     more text
this is   line 4            more    text

tr | cut

$ cut -d' ' -f4 a
is
                        # it does not show what we want!


$ tr -s ' ' < a | cut -d' ' -f4
1
2                       # this makes it!
3
4
$

awk

$ awk '{print $4}' a
1
2
3
4

bash

This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4

sed

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4

edited May 23 '17 at 12:26

Community

1
1

answered Sep 23 '14 at 10:27

fedorqui

252,262
96
511
570

2

`awk` is not only elegant and simple, it is also included in VMware ESXi, where `tr` is missing. – user121391 May 10 '16 at 09:19
2

@user121391 yet another reason to use `awk`! – fedorqui May 10 '16 at 09:29
@fedorqui I've never heard of the underscore as "junk variable". Can you provide any more insight/reference on this? – BryKKan Nov 14 '17 at 16:01
1

@BryKKan I learnt about it in Greg's [How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?](http://mywiki.wooledge.org/BashFAQ/001): _Some people use the throwaway variable _ as a "junk variable" to ignore fields. It (or indeed any variable) can also be used more than once in a single `read` command, if we don't care what goes into it_. It can be anything, it is just that it somehow became standard instead of `junk_var` or `whatever` :) – fedorqui Nov 15 '17 at 07:37
@BryKKan In Javascript it also represents a function parameter that is not meant to be used. – Adrian Sep 28 '21 at 14:55

score 27 · Answer 3 · edited May 23 '17 at 10:31

shortest/friendliest solution

After becoming frustrated with the too many limitations of cut, I wrote my own replacement, which I called cuts for "cut on steroids".

cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.

One example, out of many, addressing this particular question:

$ cat text.txt
0   1        2 3
0 1          2   3 4

$ cuts 2 text.txt
2
2

cuts supports:

auto-detection of most common field-delimiters in files (+ ability to override defaults)
multi-char, mixed-char, and regex matched delimiters
extracting columns from multiple files with mixed delimiters
offsets from end of line (using negative numbers) in addition to start of line
automatic side-by-side pasting of columns (no need to invoke paste separately)
support for field reordering
a config file where users can change their personal preferences
great emphasis on user friendliness & minimalist required typing

and much more. None of which is provided by standard cut.

See also: https://stackoverflow.com/a/24543231/1296044

Source and documentation (free software): http://arielf.github.io/cuts/

Chris Koknat · Answer 4 · 2015-10-07T17:12:16.603

4

This Perl one-liner shows how closely Perl is related to awk:

perl -lane 'print $F[3]' text.txt

However, the @F autosplit array starts at index $F[0] while awk fields start with $1

edited Oct 07 '15 at 17:12

answered Sep 09 '15 at 17:16

Chris Koknat

3,082
2
28
29

score 3 · Answer 5 · answered Nov 10 '10 at 10:37

With versions of cut I know of, no, this is not possible. cut is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.

How to make the 'cut' command treat same sequental delimiters as one?

5 Answers5

tr | cut

awk

bash

sed

Tests

tr | cut

awk

bash

sed

shortest/friendliest solution

Linked

Related