1

There is a series of procedures, each of which falls into one of two general categories.

Most of these are of category 1: it would be best to execute them one at a time - the overhead associated with threading or even using a thread pool negates any gains.

However there are a few of category 2: it would be best to move these into a concurrency scenario, like a different thread, a task, etc. These procedures, unlike the others, do justify the overhead.

The series is made up of an unknown number and unknown order of both categories, other than the fact that most will usually be of category 1.

What are the known approaches to solving this problem? Here is a stab at some approaches, but I'm curious if there are others or if there is a comprehensive survey of the problem.

  1. Start all procedures sequentially, but determine mid-process if it should be moved to a separate concurrent process.
  2. Have a 'look-ahead' process running simultaneously that decides what category the procedure belongs to. It is referred to when a procedure is ready to be started.

Both of these ideas have a tradeoff (as I'm sure any approach will): the added intelligence costs something that adds to the overhead.

Aaron Thomas
  • 119
  • 5
  • You want the code itself to determine whether it should run multithreaded or not? – Robert Harvey Mar 15 '19 at 16:29
  • @RobertHarvey I guess I'm thrown off by the question. If you could help me understand, what would be an example of having something besides the code itself determine? – Aaron Thomas Mar 15 '19 at 16:33
  • Generally speaking, you make that decision yourself. You figure out what portions of the code will benefit from multithreading under heavy load, and you make them run multithreaded unconditionally. If you do it right, the overhead imposed by enabling concurrency is minimal, and it will still perform adequately under light load. – Robert Harvey Mar 15 '19 at 16:40
  • Beyond that, your strategy is going to depend heavily on what your specific needs are. There are frameworks like Akka that will scale up as your processing needs increase, but that's an Actor Model, a different approach than multithreading. – Robert Harvey Mar 15 '19 at 16:42
  • You can also explore async operations. These are methods that return a Promise, use continuation passing style, and don't necessarily require an additional thread to give you the "sensation" of concurrency. – Robert Harvey Mar 15 '19 at 16:51
  • In short, I believe that the cost and complexity of determining on the fly whether or not you should run code concurrently will never be worth it. But maybe your system has some special demands that you haven't told us about yet. – Robert Harvey Mar 15 '19 at 16:54
  • Do the procedures in the series have to be executed in order? – Blrfl Mar 15 '19 at 17:19
  • @RobertHarvey all good points. To answer your original question: yes, the code, not the coder, makes the determination – Aaron Thomas Mar 15 '19 at 17:34
  • @Blrfl the series does not have to be executed in order. But it might be worth pointing out that the series is of an unknown order (and length also). – Aaron Thomas Mar 15 '19 at 17:36
  • 1
    How much of a performance hit do you take on a single-threaded "series" if you just assume that it needs to be multithreaded and process it accordingly? – Robert Harvey Mar 15 '19 at 17:43
  • @RobertHarvey great question. Maybe it reveals a mistake in my thinking. So, if the series has one million procedures, and 99% of them do not need concurrency - wouldn't that mean there are 990,000 threads started up and completed (or taken from the pool that would be better used for the other 10,000 procedures) that would not be otherwise? Seems like the performance hit is related to the amount of procedures and ratio of #1 to #2 - not too bad for 10 and 51%, but significant for one million and 99%? – Aaron Thomas Mar 15 '19 at 18:38
  • You would never use that many threads. Depending on whether your task is I/O bound or not, the optimal number of threads might actually be the same as the number of processor cores you have. Concurrency abstractions like Parallel.Foreach easily let you adjust the degree of parallelism you want (e.g. a maximum of 100 threads running simultaneously). – Robert Harvey Mar 15 '19 at 18:45

2 Answers2

1

Depending on your specific needs, you should find an off-the-shelf solution that already performs this "automatic scaling" for you. These exist in abundance.

For example, Akka and Akka.net do this via an Actor model and a robust message-passing mechanism. You spin up as many actors as you need, and add more hardware as your needs grow.

Writing reliable concurrent programs using solely threads, mutexes and semaphores is hard. If you're using a threading model, take advantage of modern concurrency abstractions that take some of the guesswork out of threading and provide load balancing for free.

For example, Parallel.Foreach will only spin up as many threads as it needs, and you can limit the number of threads it uses to some maximum value.

Robert Harvey
  • 199,517
1

I think you are conflating concurrency and multithreading. Most languages have a concept that provides concurrency without the huge overhead of allocating a separate thread for each task. Some of the names I've seen that concept go by are greenlets, fibers, futures, and promises. Their overhead is generally barely more than a virtual function call. I would try that before trying to munge a thread-only model to fit your needs.

Karl Bielefeldt
  • 147,435