Python ThreadPoolExecutor Tutorial
This tutorial has been taken and adapted from my book: Learning Concurrency in Python
In this tutorial we’ll be looking at Python’s ThreadPoolExecutor. This was originally introduced into the language in version 3.2 and provides a simple high-level interface for asynchronously executing input/output bound tasks.
Why Use a ThreadPoolExecutor?
ThreadPoolExecutors provide a simple abstraction around spinning up multiple threads and using these threads to perform tasks in a concurrent fashion. Adding threading to your application can help to drastically improve the speed of your application when used in the right context. By using multiple threads we can speed up applications which face an input/output based bottleneck, a good example of this would be a web crawler.
Web crawlers typically do a lot of heavy i/o based tasks such as fetching and parsing websites, if we were to fetch every page in a synchronous fashion you would find the main bottleneck for your program would be the fetching of these pages from the internet. By using something like a ThreadPoolExecutor we can effectively mitigate this bottleneck by doing multiple fetches concurrently and processing each page as it returns.
Creating a ThreadPoolExecutor
The first step we need to know is how we can define our own
ThreadPoolExecutorâs
. This is a rather simple one-liner which looks something
like so:
executor = ThreadPoolExecutor(max_workers=3)
Here we instantiate an instance of our ThreadPoolExecutor
and pass in the
maximum number of workers that we want it to have. In this case weâve defined it
as 3 which essentially means this thread pool will only have 3 concurrent
threads that can process any jobs that we submit to it.
In order to give the threads within our ThreadPoolExecutor
something to do we
can call the submit() function which takes in a function as its primary
parameter like so:
executor.submit(myFunction())
Example
In this example we put together both the creation of our ThreadPoolExecutor
object and the submission of tasks to this newly instantiated object. Weâll have
a very simple task function that will which will simply sum the numbers from 0
to 9 and then print out the result. Not the most cutting edge software Iâm sure
youâll agree but it serves as a fairly adequate example.
Below our defined task function we have our standard main function. Itâs within this that we define our executor object in a similar fashion to above before then submitting two tasks to this new pool of threads.
from concurrent.futures import ThreadPoolExecutor
import threading
import random
def task():
print("Executing our Task")
result = 0
i = 0
for i in range(10):
result = result + i
print("I: {}".format(result))
print("Task Executed {}".format(threading.current_thread()))
def main():
executor = ThreadPoolExecutor(max_workers=3)
task1 = executor.submit(task)
task2 = executor.submit(task)
if __name__ == '__main__':
main()
Output
If we were to execute our Python program above then we should see the rather bland output of both our tasks being executed and the result of our computation being printed out on the command line.
We then utilize the threading.current_thread()
function in order to determine
which thread has performed this task. You should see that the two values
outputted are distinct daemon threads.
$ python3.6 05_threadPool.py
Executing our Task
I: 45
Executing our Task
I: 45
Task Executed <Thread(<concurrent.futures.thread.ThreadPoolExecutor object at 0x102abf358>_1, started daemon 123145333858304)>
Task Executed <Thread(<concurrent.futures.thread.ThreadPoolExecutor object at 0x102abf358>_0, started daemon 123145328603136)>
Context Manager
The second and possibly most popular method of instantiating a ThreadPoolExecutor is using it as a context manager like so:
with ThreadPoolExecutor(max_workers=3) as executor:
It does much the same job as the previous method we looked at but syntactically it looks better and can be advantageous to us as the developers in certain scenarios.
Context managers, if you havenât encountered them before are an incredibly powerful concept with Python that allow us to write more syntactically beautiful code.
Example
This time weâll be defining a different task that takes in a variable ânâ as input just to give you a simple demonstration of how we can do this. The task function just prints out that itâs processing ânâ and nothing more.
Within our main function we utilize our ThreadPoolExecutor as a context manager and then call future = executor.submit(task, (n)) 3 times in order to give our threadpool something to do.
from concurrent.futures import ThreadPoolExecutor
def task(n):
print("Processing {}".format(n))
def main():
print("Starting ThreadPoolExecutor")
with ThreadPoolExecutor(max_workers=3) as executor:
future = executor.submit(task, (2))
future = executor.submit(task, (3))
future = executor.submit(task, (4))
print("All tasks complete")
if __name__ == '__main__':
main()
Output
When we execute the above program you should see that it prints out that we are starting out ThreadPoolExecutor before going on to execute the three distinct tasks we submit to it and then finally printing out that all tasks are complete.
$ python3.6 01_threadPoolExe.py
Starting ThreadPoolExecutor
Processing 2
Processing 3
Processing 4
All tasks complete
Video Tutorial
Conclusion
I hope this tutorial demystified the art of working with ThreadPoolExecutor’s in Python. If you want to learn more about how threads work in Python then I recommend checking out my appropriately named tutorial: Threads in Python.
If you need any further assistance then please let me know by leaving a comment in the comments section below or by tweeting me: @Elliot_F!