50

If I have a string variable who's value is "john is 17 years old" how do I tokenize this using spaces as the delimeter? Would I use awk?

Jake Wilson
  • 84,178
  • 88
  • 241
  • 352

5 Answers5

71
$ string="john is 17 years old"
$ tokens=( $string )
$ echo ${tokens[*]}

For other delimiters, like ';'

$ string="john;is;17;years;old"
$ IFS=';' tokens=( $string )
$ echo ${tokens[*]}
Diego Torres Milano
  • 61,192
  • 8
  • 106
  • 129
  • Very nice, feels much more like an array. – Adam Eberlin Dec 21 '13 at 21:35
  • echo ${tokens[*]} doesn't work for me I get 'bash: ${tokens[*}: bad substitution ' error. – JPM Mar 11 '20 at 16:04
  • you are missing the `*`: ```$ tokens=( a ); $ echo ${tokens[]}; -bash: ${tokens[]}: bad substitution $ echo ${tokens[*]}; a``` – Diego Torres Milano Mar 11 '20 at 21:15
  • changing `IFS` and then building array this way makes `IFS` assignment "permanent", not just for the duration of the array building. see https://stackoverflow.com/questions/62855752/bash-ifs-stuck-after-temporarily-changing-it-for-array-building – morgwai Jan 27 '22 at 18:20
68

Use the shell's automatic tokenization of unquoted variables:

$ string="john is 17 years old"
$ for word in $string; do echo "$word"; done
john
is
17
years
old

If you want to change the delimiter you can set the $IFS variable, which stands for internal field separator. The default value of $IFS is " \t\n" (space, tab, newline).

$ string="john_is_17_years_old"
$ (IFS='_'; for word in $string; do echo "$word"; done)
john
is
17
years
old

(Note that in this second example I added parentheses around the second line. This creates a sub-shell so that the change to $IFS doesn't persist. You generally don't want to permanently change $IFS as it can wreak havoc on unsuspecting shell commands.)

John Kugelman
  • 330,190
  • 66
  • 504
  • 555
  • for your examples, how would you re-use the third token (17) for example? use the for loop and count tokens? – kurumi Mar 22 '11 at 07:31
  • 1
    @Allen, then i can do this `IFS="_";set -- $string; echo $2.` or directly set it to an array like what `dtmilano` did. There is no need to use a for loop isn't it? – kurumi Mar 24 '11 at 05:40
14
$ string="john is 17 years old"
$ set -- $string
$ echo $1
john
$ echo $2
is
$ echo $3
17
kurumi
  • 24,217
  • 4
  • 43
  • 49
2

you can try something like this :

#!/bin/bash
n=0
a=/home/file.txt
for i in `cat ${a} | tr ' ' '\n'` ; do
   str=${str},${i}
   let n=$n+1
   var=`echo "var${n}"`
   echo $var is ... ${i}
done
harshit
  • 7,785
  • 22
  • 69
  • 97
  • The use of `tr` makes this the best solution. Your exemple code could be much simpler : `echo john is 17 years old | tr ' ' '\n'` – Titou May 11 '17 at 08:49
1

with POSIX extended regex:

$ str='a b     c d'
$ echo "$str" | sed -E 's/\W+/\n/g' | hexdump -C
00000000  61 0a 62 0a 63 0a 64 0a                           |a.b.c.d.|
00000008

this is like python's re.split(r'\W+', str)

\W matches a non-word character,
including space, tab, newline, return, [like the bash for tokenizer]
but also including symbols like quotes, brackets, signs, ...

... except the underscore sign _,
so snake_case is one word, but kebab-case are two words.

leading and trailing space will create an empty line.

Mila Nautikus
  • 1,481
  • 1
  • 9
  • 18