1

(please note that this entire question although addressing parrallel programming, is largely framed under the context/applications of python 3.X)

At the moment, what I gather from reading is that:

a process, is a set of instructions, along with all the resources that accompany it while it is running. It would include the following code, as well as the input/output/resource/memory/filehandle/etc. In other words, its the whole kitchen sink.

# this script, while running as a whole, is considered a process

print('hello world')

with open('something.txt', 'a') as file_handle:
    for i in range(500):
        file_handle.write('blablabla')
print('job done!')

However, if I wanted to do more in the same amount of time - in order to maximize my computers processing power - I have the option to spawn more processes or threads. Which one do I choose, compared to the simple python script process analogy above, what would they be? Is spawning another process the equivalent of just recalling the entire thing again while changing the filename?

# changed filename (is this "another process?")

print('hello world')

with open('something_else.txt', 'a') as file_handle:
    for i in range(500):
        file_handle.write('blablabla')
print('job done!') 

I also get the vague idea that a single process can contain multiple threads, would it just be the equivalent of loading a bunch of more "conceptual" for loops then?

# like would this be a "thread" a barebones "subset" of an entire program?

with open('something.txt', 'a') as file_handle:
    for i in range(500):
        file_handle.write('blablabla')

How are the two really different from one and another anyways? Searching online I get the idea that processes are more independent and heavyweight, while threads are more lightweight and "easier to share memory with each other." But what does this really mean? Why cant processes share memory with each other too? And if threads can "share memory" how come I cant access differing variables from differing threads that are spawned from the same script (e.g. like from thread_a import var_data)

Lastly, what computes what exactly? Does a CPU compute threads or processes? Or is it an overarching term encompassing multiple cores/etc. Do cores compute processes or threads?


Summary:

1) Using a simple python script as an example for a process, what would the equivalent of spawning another process/thread be? (e.g. duplicate script/subset of a script/some section of code only)

2) how are processes fundamentally different from threads, what is an example of processes being able to do something that threads cannot?

3) why is memory/data often described as "harder to share" in processes than threads? and how do threads share data anyways?

4) Do CPU's compute threads or processes. Do cores compute threads or processes

5) What are some general guidelines/examples of when to use what

AlanSTACK
  • 4,735
  • 3
  • 33
  • 80

1 Answers1

2

To start answering this, you must understand what is python GIL. Basically Python is designed to let any part of the code access memory. To avoid issues (such as multiple call to the same memory at the same time...), there is a Lock that forbids 2 tasks to be executed at the same time. So this is why python is purely procedural, executing tasks one after the other.

In modern programming, there is a will to better use the multi-core processors, and thus parallelize the programming to improve performance. Because of the GIL, there is 2 workaround:

  • Threading is a module that allow to spawn multiple tasks "at the same time" in different threads. The catch is that it's not really at the same time, but will be cut into atomic tasks, and switch between the different tasks. BUT you will NEVER have 2 tasks at the same time, so you can still share memory like usual, that's why it's simple.

  • multiprocessing on the other hand, allows you to spawn real processes, which which will work simultaneously. BUT the price is that you can't safely have shared memory between these processes (in the classic way). There is no problem in having multiple processes with multiple threads in it. You are not completely alone though. There is a few ways to communicate safely between processes, by using a Lock for instance. You can see more on this here.

To sum up, Threads and Process allows you to separate some tasks for others, giving you a way to improve your basic procedural programm. In some languages there is not much distinction in the way they work, but in Python the main thing to remember are :

  • Threads : Keep a shared memory, but not really parallel programming. This is useful if your code as waiting times, so you can do other stuff in between. If you are using 100% CPU, it will slow down your code because the execution will change often between task and cause an overhead.

  • Processes : A bit more difficult to implement, because you have to worry about the memory, which you normally don't in Python. The major upside is you can dramatically improve your performances if your code can be parallelized.

Community
  • 1
  • 1
CoMartel
  • 3,301
  • 3
  • 22
  • 42
  • I would like to learn more about the subtle nuances of threads/processes and the more detailed mechanisms in how they share resources. Where could I go to learn this? Would this be in the domain of an introductory to OS course, or parallel programming? Any recommended materials? (with a focus on python of course) also, what if I had 2 cores, wouldnt having 2 threads be ACTUALLY parallel? – AlanSTACK Feb 15 '16 at 19:58
  • Threads in Python are never actually parallel, even if you have multiple core. I think you should go for a parallel programming course rather than an OS course. Both Threads and Process are not very difficult to begin with, and then it all depend on your application. You can find lots of documentation [here](http://sebastianraschka.com/Articles/2014_multiprocessing_intro.html), [here](https://pymotw.com/2/multiprocessing/communication.html), [here](http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing). – CoMartel Feb 16 '16 at 07:57
  • You should really define a simple application and test it with Threads and Process. If you have a question about a specific application and what method would work best, ask another question about this particular case. – CoMartel Feb 16 '16 at 08:03