110

I currently have the current script.

#!/bin/bash
# script.sh

for i in {0..99}; do
   script-to-run.sh input/ output/ $i
done

I wish to run it in parallel using xargs. I have tried

script.sh | xargs -P8

But doing the above only executed once at the time. No luck with -n8 as well. Adding & at the end of the line to be executed in the script for loop would try to run the script 99 times at once. How do I execute the loop only 8 at the time, up to 100 total.

Olivier
  • 1,821
  • 5
  • 23
  • 29
  • That is what I initially wanted to do, but had to resort to xargs because I am on Windows. I was not able to get GNU Parallel running on Windows – Olivier Feb 06 '15 at 03:21
  • Is that script calling itself or did you just confuse the names when you asked here? – Etan Reisner Feb 06 '15 at 03:24
  • Sorry, it should call another script. I will fix it – Olivier Feb 06 '15 at 03:26
  • The answer to https://stackoverflow.com/questions/3321738/shell-scripting-using-xargs-to-execute-parallel-instances-of-a-shell-function is relevant here. – Etan Reisner Feb 06 '15 at 03:28

3 Answers3

160

From the xargs man page:

This manual page documents the GNU version of xargs. xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is /bin/echo) one or more times with any initial- arguments followed by items read from standard input. Blank lines on the standard input are ignored.

Which means that for your example xargs is waiting and collecting all of the output from your script and then running echo <that output>. Not exactly all that useful nor what you wanted.

The -n argument is how many items from the input to use with each command that gets run (nothing, by itself, about parallelism here).

To do what you want with xargs you would need to do something more like this (untested):

printf %s\\n {0..99} | xargs -n 1 -P 8 script-to-run.sh input/ output/

Which breaks down like this.

  • printf %s\\n {0..99} - Print one number per-line from 0 to 99.
  • Run xargs
    • taking at most one argument per run command line
    • and run up to eight processes at a time
Etan Reisner
  • 73,512
  • 8
  • 94
  • 138
  • 9
    Actually you don't need to put the arguments on separate lines; xargs word-splits. So `echo {0..99} |` would work just as well. `<< – rici Feb 06 '15 at 03:41
  • 1
    @rici Looks like a documentation bug then especially since the documentation for Here Documents *doesn't* mention brace expansion (and it doesn't happen there either in a quick test) though they also don't mention tilde expansion (which doesn't happen for `< – Etan Reisner Feb 06 '15 at 03:49
  • 1
    How can you separate results from different runs with e.g. newlines? – nirvana-msu Oct 08 '17 at 00:48
  • 6
    Demo: `time head -12 – Walter A Oct 16 '19 at 10:13
  • 2
    It's probably worth noting that `-P 0` will use the number of cpus on the system – slf Dec 03 '21 at 17:35
77

With GNU Parallel you would do:

parallel script-to-run.sh input/ output/ {} ::: {0..99}

Add in -P8 if you do not want to run one job per CPU core.

Opposite xargs it will do The Right Thing, even if the input contain space, ', or " (not the case here, though). It also makes sure the output from different jobs are not mixed together, so if you use the output you are guaranteed that you will not get half-a-line from two different jobs.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Ole Tange
  • 29,397
  • 5
  • 75
  • 92
  • 30
    This doesn't answer the question, nor point out why xargs cannot achieve the same thing. – 张实唯 Dec 30 '16 at 04:17
  • 12
    downvote because xarg for me does exactly as second picture shows. – noonex Feb 07 '17 at 08:24
  • 3
    @noonex Are you aware that not everyone uses the version of xargs that you use and that -P is not in all versions of xargs? – Ole Tange Feb 07 '17 at 12:32
  • It's available in the latest versions of BSD and GNU xargs. I think it's safe to say that the feature will eventually become ubiquitous. – William T Froggard Mar 12 '18 at 00:31
  • @3ED Nope: -j is an alias for -P. If you want to run on 1 CPU thread less than all (e.g. 7 out of 8), then use -j-1. GNU Parallel will autodetect the number of CPU threads. – Ole Tange Jul 23 '18 at 16:01
  • Yes. Sorry. How I RTFM? - I don't know. :/ – 3ED Jul 23 '18 at 19:37
  • 35
    Perhaps not all are aware that this answer is provided by the author of GNU parallel. – izkeros Apr 16 '19 at 15:32
  • 6
    Downvoted due to clear advertisement on a piece of software that doesn't run correctly as described on first attempts, due to an interactive prompt that messes up most scripts. – Daniel Sorichetti Mar 30 '20 at 19:32
  • 1
    @DanielSorichetti I don't think that's fair at all. The "prompt" you mentioned is probably the *message* that's output telling it's waiting for input from the terminal ... because you didn't provide any input. The command line provided in the answer works fine if you put "echo" in place of the script. Furthermore there's nothing wrong, to the contrary, for a software author, even more a software that's part of GNU, to inform people about its existence. I too do talk about software I'm involved in. My only complain is the lack of explanation whether xargs does the job here. – Johan Boulé Dec 17 '20 at 20:36
  • 1
    @JohanBoulé: While an excellent piece of software, the issue that most people have with parallel (and it's bundling with GNU packages) is the requirement to accept additional license terms (--will-cite / --citation). See https://www.gnu.org/licenses/gpl-3.0.en.html. – ives Feb 14 '21 at 22:37
  • @ives Thanks for the info. I see the problem led to a FAQ entry there https://www.gnu.org/licenses/gpl-faq.en.html#RequireCitation – Johan Boulé Feb 17 '21 at 19:08
  • @JohanBoulé See the FAQ: https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt – Ole Tange Feb 17 '21 at 20:03
4

You can use this simple 1 line command

seq 1 500 | xargs -n 1 -P 8 script-to-run.sh input/ output/
Shubham Gupta
  • 159
  • 1
  • 5