4
#!/bin/bash
if [ ! -f numbers ]; then echo 0 > numbers; fi
count=0
touch numbers
echo $count > numbers
while [[ $count != 100 ]]; do
  if ln numbers numbers.lock
  then
    count=`expr $count + 1`
    n=`tail -1 numbers`
    expr $n + 1 >> numbers
    rm numbers.lock
  fi
done

What can I do to avoid the race condition of count=`expr $count + 1` and the n=`tail -1 numbers`, so that when I run two of the scripts at the same time, it only goes to 100, and not 200. I've researched on multiple websites, but there are no concise answers without making a huge function.

Jonathan Leffler
  • 698,132
  • 130
  • 858
  • 1,229
StingRay21
  • 342
  • 5
  • 15

2 Answers2

6

You are already safely avoiding the actual race condition with the lock file. The problem you are describing can be avoided two ways.

(1) Move the lock file outside the main loop, so that two instances of your program cannot run their main loop at the same time. If one is running, the other will have to wait until it's done, then start replacing the output file.

#!/bin/bash

# FIXME: broken, see comments

while true; do
    if ! ln numbers numbers.lock
    then
       sleep 1
    else
        if [ ! -f numbers ]; then echo 0 > numbers; fi
        count=0
        touch numbers
        #echo $count > numbers   # needless, isn't it?
        while [[ $count != 100 ]]; do
            count=`expr $count + 1`
            n=`tail -1 numbers`
            expr $n + 1 >> numbers
            rm numbers.lock
        done
        break
    fi
done

(2) Make the two instances cooperate, by examining what the contents of the file are. In other words, force them to stop looping when the number reaches 100, regardless of how many other processes are writing to this file. (I guess there is an iffy corner case when there are more than 100 instances running.)

#!/bin/bash
# FIXME: should properly lock here, too
if [ ! -f numbers ]; then echo 0 > numbers; fi
n=0
touch numbers
while [[ $n -lt 100 ]]; do
  if ln numbers numbers.lock
  then
    n=$(expr $(tail -1 numbers) + 1 | tee numbers)
    rm numbers.lock
  fi
done

Depending on your requirements, you might actually want the script to clobber any previous value in the file when a new instance of the script is starting, but if not, the echo 0 > numbers should be governed by the lock file, too.

You really want to avoid expr in a Bash script; Bash has built-in arithmetic operators. I have not attempted to refactor that part here, but you probably should. Perhaps prefer Awk so you can factor out the tail too; awk '{ i=$0 } END { print 1+i }' numbers

tripleee
  • 158,107
  • 27
  • 234
  • 292
  • The first example seems to need a `touch numbers` before the `while true` loop. The `ln` command will fail if numbers doesn't exist, so it can never be created by that code if it doesn't already exist – SpinUp Oct 27 '21 at 18:07
  • I think using `ln -s` in place of plain `ln` should also work for the 3rd line of code in the first example -- then the command will fail if `numbers.lock` exists, but merrily carry on even if `numbers` doesn't exist yet – SpinUp Oct 27 '21 at 18:19
  • What would the point of that be? The _purpose_ of the operation is to fail if something is not right. But yes, the `touch` was missing, thanks for noticing that. – tripleee Oct 28 '21 at 05:10
  • But a missing `numbers` file isn't treated as a failure in your code -- you explicitly check for its existence and create it if necessary in the `if [ ! -f numbers ]` line. If you keep the `ln` as it is (not `ln -s`) that existence check is totally pointless, since it's already contained in `if ! ln ...`. Also `rm numbers.lock` should go *outside* the loop (`while [[ $count != 100 ]]`), otherwise you're re-introducing the race condition (and generating 99 error messages to the terminal). – SpinUp Oct 28 '21 at 17:39
  • 1
    You're right ... I'm not sure what I was thinking at the time; I can't get the first one to work at all. – tripleee Oct 28 '21 at 18:00
  • fair enough -- still upvoted though because the example was close to my application and helped set me on the right track – SpinUp Oct 28 '21 at 21:39
0

I put this one-liner on the top of my script to make it race condition safe:

if [[ -d "/tmp/${0//\//_}" ]] || ! mkdir "/tmp/${0//\//_}"; then echo "Script is already running!" && exit 1; fi; trap 'rmdir "/tmp/${0//\//_}"' EXIT;

By that I don't need to think about any race conditions in my further code.

Break down the code:

  1. [[ -d "/tmp/${0//\//_}" ]] checks if the lock dir /tmp/_path_to_script_scriptname.sh/ exists. Note: $0 contains the name of the script.
  2. mkdir "/tmp/${0//\//_}" creates the dir if it does not exist
  3. then ... exit 1 abort script if lock dir already exists (which means the script is already running)
  4. trap 'rmdir "/tmp/${0//\//_}"' EXIT automatically deletes the lock dir if the script is exited (which does not hit on race conditions as the trap command is defined later.

Note: In very rare cases like server crashes the lock dir gets not deleted. For this you could think about a cronjob which checks for outdated lock dirs. If you need trap in your script (which can't be set twice), than use one of the different multi trap solutions.

mgutt
  • 5,377
  • 2
  • 49
  • 71