Python ProcessPoolExecutor Tutorial
This tutorial has been taken and adapted from my book: Learning Concurrency in Python
Video Tutorial
Introduction
In this tutorial we will be looking at how you can utilize multiple processors within your Python Programs.
Multiprocessing vs Multithreading
Knowing when and where to use multiple threads vs multiple processes is incredibly important if you are going to be working on highly performant Python programs. Misuse of either threads or processes could lead to your systems actually seeing performance degradation.
THere are typically 2 types of performance bottleneck for most programs, this is either an I/O bottleneck or a CPU-based bottleneck.
If your program spends more time waiting on file reads or network requests or
any type of I/O
task, then it is an I/O
bottleneck and you should be looking
at using threads to speed it up.
If your program spends more time in CPU based tasks over large datasets then it
is a CPU
bottleneck. In this scenario you may be better off using multiple
processes in order to speed up your program. I say may as it’s possible that a
single-threaded Python program may be faster for CPU bound problems, it can
depend on unknown factors such as the size of the problem set and so on.
Getting Started
Let’s take a look at ProcessPoolExecutors
. ProcessPoolExecutors
can be used
and created in much the same way as your standard
ThreadPoolExecutors.
It subclasses the Executor class the same way the ThreadPoolExecutor
class
does and thus features many of the same methods within it.
Creating a ProcessPoolExecutor
The process for creating a ProcessPoolExecutor
is almost identical to that of
the ThreadPoolExecutor
except for the fact that we have to specify we’ve
imported that class from the concurrent.futures module and that we also
instantiate our executor object like so:
Executor = ProcessPoolExecutor(max_workers=3)
Example
The below example features a very simple full example of how you can instantiate
your own ProcessPoolExecutor
and submit a couple of tasks into this pool. It
should be noted that our task function here isn’t that computationally expensive
so we may not see the full benefit of using multiple processes and it could in
fact be significantly slower than your typical single-threaded process.
We’ll use the os
module to find the current PID
of each of the tasks that we
execute within our pool.
from concurrent.futures import ProcessPoolExecutor
import os
def task():
print("Executing our Task on Process {}".format(os.getpid()))
def main():
executor = ProcessPoolExecutor(max_workers=3)
task1 = executor.submit(task)
task2 = executor.submit(task)
if __name__ == '__main__':
main()
Output
When we run this you should see that both our submitted tasks are executed as well as the Process IDs in which they were executed. This is a very simple example but it’s good at verifying that we are indeed running our tasks across multiple processes.
$ python3.6 06_processPool.py
Executing our Task on Process 40365
Executing our Task on Process 40366
Context Manager
It should be noted that you can also write this in a more succinct fashion
from concurrent.futures import ProcessPoolExecutor
import os
def task():
print("Executing our Task on Process: {}".format(os.getpid()))
def main():
## executor = ProcessPoolExecutor(max_workers=3)
with ProcessPoolExecutor(max_workers=3) as executor:
task1 = executor.submit(task)
task2 = executor.submit(task)
main()
Output
When you run this you should see exactly the same output as before:
$ python3.6 test.py
Executing our Task on Process: 56335
Executing our Task on Process: 56336
Conclusion
If you found this tutorial useful or require further assistance then please do not hesitate to let me know in the comments section below!