327

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut command in the following manner:

cat text.txt | cut -d " " -f 4

Unfortunately, cut doesn't treat several spaces as one delimiter. I could have piped through awk

awk '{ printf $4; }'

or sed

sed -E "s/[[:space:]]+/ /g"

to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively?

mbaitoff
  • 8,283
  • 4
  • 22
  • 32
  • 13
    AWK is the way to go. – Dennis Williamson Nov 10 '10 at 15:10
  • Possible duplicate of [linux cut help - how to specify more spaces for the delimiter?](http://stackoverflow.com/questions/7142735/linux-cut-help-how-to-specify-more-spaces-for-the-delimiter) – Inanc Gumus Jan 13 '17 at 18:53
  • I love `awk` BUT when you are doing `kubectl ... bash -c 'awk ...'` and similar, things start to get funny with quotes, parameter references, etc. Then it's actually quite nice to whip out the old rudimentary tools from the toolbox. – sastorsl Apr 29 '22 at 06:51

5 Answers5

578

Try:

tr -s ' ' <text.txt | cut -d ' ' -f4

From the tr man page:

-s, --squeeze-repeats   replace each input sequence of a repeated character
                        that is listed in SET1 with a single occurrence
                        of that character
Austin Adams
  • 6,515
  • 3
  • 22
  • 27
kev
  • 146,428
  • 41
  • 264
  • 265
  • 26
    No need for `cat` here. You could pass `< text.txt` directly to `tr`. http://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_cat – arielf Aug 09 '14 at 20:10
  • 1
    Not sure it is any simpler, but you are going to merge, you can forgo cut's `-d` and translate straight from multiple characters to tab. For example: I came here looking for a way to automatically export my display: `who am i | tr -s ' ()' '\t' | cut -f5` – Leo Mar 28 '16 at 23:24
  • This doesn't remove leading/trailing whitespace (which may or may not be wanted, but usually isn't), in contrast with the awk solution. The awk solution is also much more readable and less verbose. – n.caillou Apr 04 '18 at 23:31
  • -1 **WARNING: THIS IS NOT THE SAME THING AS TREATING SEQUENTIAL DELIMETERS AS ONE.** Compare `echo "a b c" | cut -d " " -f2-`, `echo "a b c" | tr -s " " | cut -d " " -f2-` – user541686 Jul 21 '19 at 10:01
101

As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.

Let me however go through all the possible combinations for future readers. Explanations are at the Test section.

tr | cut

tr -s ' ' < file | cut -d' ' -f4

awk

awk '{print $4}' file

bash

while read -r _ _ _ myfield _
do
   echo "forth field: $myfield"
done < file

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file

Tests

Given this file, let's test the commands:

$ cat a
this   is    line     1 more text
this      is line    2     more text
this    is line 3     more text
this is   line 4            more    text

tr | cut

$ cut -d' ' -f4 a
is
                        # it does not show what we want!


$ tr -s ' ' < a | cut -d' ' -f4
1
2                       # this makes it!
3
4
$

awk

$ awk '{print $4}' a
1
2
3
4

bash

This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4

sed

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4
Community
  • 1
  • 1
fedorqui
  • 252,262
  • 96
  • 511
  • 570
  • 2
    `awk` is not only elegant and simple, it is also included in VMware ESXi, where `tr` is missing. – user121391 May 10 '16 at 09:19
  • 2
    @user121391 yet another reason to use `awk`! – fedorqui May 10 '16 at 09:29
  • @fedorqui I've never heard of the underscore as "junk variable". Can you provide any more insight/reference on this? – BryKKan Nov 14 '17 at 16:01
  • 1
    @BryKKan I learnt about it in Greg's [How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?](http://mywiki.wooledge.org/BashFAQ/001): _Some people use the throwaway variable _ as a "junk variable" to ignore fields. It (or indeed any variable) can also be used more than once in a single `read` command, if we don't care what goes into it_. It can be anything, it is just that it somehow became standard instead of `junk_var` or `whatever` :) – fedorqui Nov 15 '17 at 07:37
  • @BryKKan In Javascript it also represents a function parameter that is not meant to be used. – Adrian Sep 28 '21 at 14:55
27

shortest/friendliest solution

After becoming frustrated with the too many limitations of cut, I wrote my own replacement, which I called cuts for "cut on steroids".

cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.

One example, out of many, addressing this particular question:

$ cat text.txt
0   1        2 3
0 1          2   3 4

$ cuts 2 text.txt
2
2

cuts supports:

  • auto-detection of most common field-delimiters in files (+ ability to override defaults)
  • multi-char, mixed-char, and regex matched delimiters
  • extracting columns from multiple files with mixed delimiters
  • offsets from end of line (using negative numbers) in addition to start of line
  • automatic side-by-side pasting of columns (no need to invoke paste separately)
  • support for field reordering
  • a config file where users can change their personal preferences
  • great emphasis on user friendliness & minimalist required typing

and much more. None of which is provided by standard cut.

See also: https://stackoverflow.com/a/24543231/1296044

Source and documentation (free software): http://arielf.github.io/cuts/

Community
  • 1
  • 1
arielf
  • 5,644
  • 1
  • 34
  • 46
4

This Perl one-liner shows how closely Perl is related to awk:

perl -lane 'print $F[3]' text.txt

However, the @F autosplit array starts at index $F[0] while awk fields start with $1

Chris Koknat
  • 3,082
  • 2
  • 28
  • 29
3

With versions of cut I know of, no, this is not possible. cut is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.

Benoit
  • 73,313
  • 23
  • 201
  • 230