2

What is a regex I can write in bash for parsing a line and extracting text that can be found between two | (so that would be ex: 1: |hey| 2: |boy|) and keeping those words in some sort of array?

Andy Lester
  • 86,927
  • 13
  • 98
  • 148
syker
  • 10,486
  • 16
  • 53
  • 66
  • Is your example "ex: 1: |hey| 2: |boy|" a sample LINE to parse or the RESULTS of parsing a line? If the latter, what is a sample line that would produce those results? I can think of a number of approaches but they depend on what your input looks like, and which approach is "best" depends on what you do next with the "array". – Stephen P Apr 08 '10 at 22:01
  • the example is a sample LINE. in fact the example can be on new lines. – syker Apr 08 '10 at 22:02
  • what i want to do with the array is to just print it out in a special formatted order (like say commas in between) and sort it as well – syker Apr 08 '10 at 22:03

5 Answers5

2

no need complicated regular expression. Split on "|", then every 2nd element is what you want

#!/bin/bash
declare -a array
s="|hey| 2: |boy|"
IFS="|"
set -- $s
array=($@)
for((i=1;i<=${#array[@]};i+=2))
do
 echo ${array[$i]}
done

output

$ ./shell.sh
hey
boy

using awk

$ echo s="|hey| 2: |boy|" |  awk -F"|" '{for(i=2;i<=NF;i+=2)print $i}'
hey
boy
ghostdog74
  • 307,646
  • 55
  • 250
  • 337
  • +1 Nice use of IFS, set and (). But, this approach won't work if the left and right delimiters differ (say, '') and the order is meaningful, or the delimiter were multi-character (say, "--"). A regex approach is more general/flexible, IMHO. – Kevin Little Apr 09 '10 at 03:59
  • to make it more flexible is not difficult either. until that is required by OP, it will be left as it is. – ghostdog74 Apr 09 '10 at 04:17
1
$ foundall=$(echo '1: |hey| 2: |boy|' | sed -e 's/[^|]*|\([^|]\+\)|/\1 /g')
$ echo $foundall
hey boy
$ for each in ${foundall}
> do
>  echo ${each}
> done
hey
boy
Stephen P
  • 13,862
  • 2
  • 43
  • 65
0

Use sed -e 's,.*|\(.*\)|.*,\1,'

Dennis Williamson
  • 324,833
  • 88
  • 366
  • 429
syker
  • 10,486
  • 16
  • 53
  • 66
0

In your own answer, you output what's between the last pair of pipes (assuming there are more than two pipes on a line).

This will output what's between the first pair:

sed -e 's,[^|]*|\([^|]*\)|.*,\1,'

This will output what's between the outermost pair (so it will show pipes that appear between them):

sed -e 's,[^|]*|\(.*\)|.*,\1,'
Dennis Williamson
  • 324,833
  • 88
  • 366
  • 429
0
#!/bin/bash

_str="ex: 1: |hey| 2: |boy|"
_re='(\|[^|]*\|)(.*)'  # in group 1 collect 1st occurrence of '|stuff|';
                       # in group 2 collect remainder of line. 

while [[ -n $_str ]];do
   [[ $_str =~ $_re ]]
   [[ -n ${BASH_REMATCH[1]} ]] && echo "Next token is '${BASH_REMATCH[1]}'"
   _str=${BASH_REMATCH[2]}
done

yields

Next token is '|hey|'
Next token is '|boy|'
Kevin Little
  • 11,884
  • 5
  • 37
  • 46