0

I wrote a shell script to check which ".err" text files are empty. Some files have a specific repeated phrase, like this example file fake_error.err (blank lines intentional):


WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

that I want to also remove in addition to the empty files. I wrote the following script to do so

#!/bin/bash

for file in *error.err; do if [ ! -s $file ] then echo "$file is empty" rm $file else # Get the unique, non-blank lines in the file, sorted and ignoring blank space lines=$(grep -v "^$" "$file" | sort -bu "$file") echo $lines

    EXPECTED="WARNING: reaching max number of iterations"
    echo $EXPECTED

    if [ "$lines" = "$EXPECTED" ]
    then
        # Remove the file that only has iteration warnings
        echo "Found reached max iterations!"
        rm $file
    fi

fi

done

However, the output of this script when run on the fake_error.err files is

WARNING: reaching max number of iterations
WARNING: reaching max number of iterations

from the two $echo statements in the loop, but the file itself is not deleted and the string "Found reached max iterations!" is not printed. I think the issue is in if [ "$lines" = "$EXPECTED" ] and I've tried using double brackets [[ ]] and == but none of those worked. I have no idea what the difference between the two printed statement are.

Why are the two variables not equal?

m13op22
  • 101
  • 2
    see if there are any trailing blanks? sort -bu won't delete them. e.g. echo ":$lines:" or something like that – ilkkachu Mar 29 '23 at 20:00
  • @ilkkachu that's it. The output is : WARNING: reaching max number of iterations: WARNING: reaching max number of iterations. How can I remove the blanks from this? – m13op22 Mar 29 '23 at 20:24
  • 1
    Since you're already using grep, I wonder if it would not be simpler to use the exit status of grep -qv -e '^$' -e 'WARNING: reaching max number of iterations' directly? – steeldriver Mar 29 '23 at 20:30
  • Ooh, that's an idea, @steeldriver! Something like if ! (grep -qv -e '^$' -e 'WARNING: reaching max number of iterations' $file) since it would return exit status 0 for matches found? – m13op22 Mar 29 '23 at 20:48
  • 1
    @m13op22 yes it "succeeds" if it finds any line that is neither empty not the ignorable phrase - don't think you need the parentheses though – steeldriver Mar 29 '23 at 20:54
  • Unless you have files containing ONLY blank lines, you shouldn't need to grep for them - grepping for just "WARNING: reaching maximum...." should be enough. And if you do need to grep for only blank lines, that should be a separate command, perhaps comparing the outputs from wc -l "$file" and grep -c '^[[:blank:]]*$' "$file". BTW, you can't combine an inverted match -v with a normal match in the same grep command, the -v applies to all -e options in that command. Use awk or perl if you need to do boolean logic with regex matches like ! /^$/ && /WARNING: reaching.../. – cas Mar 30 '23 at 01:54
  • @cas good point about the quotes, thanks for making sure I'm not being sloppy! Some files are only blank lines, so it's better to use a separate command that uses awk instead? – m13op22 Mar 30 '23 at 15:22
  • i don't think so. as far as i can tell from your script above, you want to delete empty files (your -s test works well for that) AND files that contain "WARNING: reaching maximum....". grepping for blank lines isn't needed for that. If you also want to contain files containing ONLY blank lines then yes, compare the total line count of each file against the count of empty lines in that file - if equal, then delete it. you'd only need to use awk or perl if you needed more than a simple regex match. – cas Mar 30 '23 at 16:07
  • BTW, my awk example was bogus because the ! /^$/ test is redundant, it's always going to be true if the line contains the warning. I just wanted a quick example and didn't think that one through. better would be if you wanted to check if a file contained both foo and bar on the same line, then you'd use awk '/foo/ && /bar/ { ... }' – cas Mar 30 '23 at 16:09

0 Answers0