2

I am working on the following dataset (a sample can be found below) and I would like to create a bash script that allows me to select only the records that meet a set of conditions and all records fulfilling these conditions are collected in another file.

1. Regex to get which continent must be Asia, Africa or Europe, therefore discarding the rest. 

2. Regex to get "Death Percentatge" must be greater than 0.50

3- Regex to get the "Survival Percentatge" to be greater than 2.00.

It is important that it is a bash script that uses these regular expressions in if conditions.
Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage
Afghanistan,Afghanistan,AFG,40462186,Asia,177827,7671,4395,190,4.31,0.42
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.27,9.41
Algeria,Algeria,DZA,45236699,Africa,265691,6874,5873,152,2.58,0.57
Andorra,Andorra,AND,77481,Europe,40024,153,516565,1975,0.38,51.54

And as these records are included in the new file, "Survival Percentatge" and "Death Percentatge" must be subtracted to create a new variable called "Dif.porc.pts" to collect the percentage difference in absolute value.

The code I have proposed is as follows but I don't have experience in bash as in other languages.


read
while IFS=, read _ _ _ Continent _ _ _ _ _ Death Percentatge Survival Percentatge; do
     if [[Continent ~ /Africa|Asia|Europe/) && (Death Percentage ~ /[0].[5-9][0-9] &&(Survival 
     Percentage ~ /[2-9].[0-9][0-9]]]
          diff.porc.pts=$($Survical Percentatge/$Death Percentatge)|sed 's/-//'
          paste -sd > new_file.txt
     fi
cat new_file.txt

I also attach a sample of what the desired output should look like.

Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage, diff.porc.pts
Afghanistan,Afghanistan,AFG,40462186,Asia,177827,7671,4395,190,4.31,0.42,3.89
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.27,9.41,8.14
Algeria,Algeria,DZA,45236699,Africa,265691,6874,5873,152,2.58,0.57,2.01
Andorra,Andorra,AND,77481,Europe,40024,153,516565,1975,0.38,51.54,51.16

I would be grateful if you could help me to complete it.

Thank you in advance

oshiono
  • 71
  • 5

1 Answers1

0
  • You cannot include whitespaces in bash variable names.
  • You've misspelled Percentage as Percentatge.
  • You've miscouted the column position of Continent.
  • Regex operator in bash is =~, not ~.
  • You should not enclose the regex with slashes.
  • You will need to use bc or other external command for arithmetic calculation of decimal numbers.

Then would you please try the following:

#!/bin/bash

while read -r line; do
    if (( nr++ == 0 )); then            # header line
        echo "$line,diff.porc.pts"
    else                                # body
        IFS=, read _ _ _ _ Continent _ _ _ _ pDeath pSurvival <<< "$line"
        if [[ $Continent =~ ^(Africa|Asia|Europe)$ && $pDeath =~ ^(0\.[5-9]|[1-9]) && $pSurvival =~ ^([2-9]\.|[1-9][0-9]) ]]; then
            diff=$(echo "$pSurvival - $pDeath" | bc)
            echo "$line,$diff"
        fi
    fi
done < input_file.txt > new_file.txt

Output:

Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage,diff.porc.pts
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.27,9.41,8.14

It looks the record of Albania only meets the conditions contrary to the shown desired output.

tshiono
  • 17,571
  • 2
  • 13
  • 19
  • However, a much better solution is to write a simple Awk script instead. See also [`while read` loop extremely slow compared to `cat`, why?](https://stackoverflow.com/questions/13762625/bash-while-read-loop-extremely-slow-compared-to-cat-why) – tripleee May 23 '22 at 09:51
  • @tripleee thank you for the suggestion. I strongly agree with your opinion. I was just honestly obeying the OP's requirement: `It is important that it is a bash script that uses these regular expressions in if conditions.` :) – tshiono May 23 '22 at 10:29
  • @tshiono your responses are always helpful. Could you take a look to this please? https://askubuntu.com/questions/1410054/creating-an-html-from-the-output-of-awk-script/1410091#1410091 – oshiono May 23 '22 at 12:04
  • I've posted an answer to the linked question. Hope it will be a help. Cheers. – tshiono May 24 '22 at 02:34