Asynchronous Parallel Programming in Python with Multiprocessing
A flexible method to speed up code on a personal computer
Do you wish your Python scripts could run faster? Maybe they can. And you won’t (probably) have to buy a new computer, or use a super computer. Most modern computers contain multiple processing cores but, by default, python scripts only use a single core. Writing code can run on multiple processors can really decrease your processing time. This article will demonstrate how to use the multiprocessing
module to write parallel code that uses all of your machines processors and gives your script a performance boost.
Synchronous vs Asynchronous Models
An asynchronous model starts tasks as soon as new resources become available without waiting for previously running tasks to finish. By contrast, a synchronous model waits for task 1 to finish before starting task 2. For a more detailed explanation with examples, check out this article in The Startup. Asynchronous models often offer the greatest opportunity for performance improvement, if you can structure your code in the proper manner. That is, tasks can run independently of one another.
For the sake of brevity, this article is going to focus solely on asynchronous parallelization because that is the method that will likely boost performance the most. Also, if you structure code for asynchronous parallelization on your laptop, it is much easier to scale up to a super computer.
Installation
Since Python 2.6 multiprocessing
has been included as a basic module, so no installation is required. Simply import multiprocessing
. Since ‘multiprocessing’ takes a bit to type I prefer to import multiprocessing as mp
.
The Problem
We have an array of parameter values that we want to use in a sensitivity analysis. The function we’re running the analysis on is computationally expensive. We can cut down on processing time by running multiple parameter simultaneously in parallel.
Setup
Import multiprocessing
, numpy
and time
. Then define a function that takes a row number, i
, and three parameters as inputs. The row number is necessary so results can later be linked to the input parameters. Remember, the asynchronous model does not preserve order.
import multiprocessing as mp import numpy as np import time def my_function(i, param1, param2, param3): result = param1 ** 2 * param2 + param3 time.sleep(2) return (i, result)
For demonstrative purposes, this is a simple function that is not computationally expensive. I’ve added a line of code to pause the function for 2 seconds, simulating a long run-time. The function output is going to be most sensitive to param1
and least sensitive to param3
. In practice, you can replace this with any function.
We need a function that can take the result of my_function
and add it to a results list, which is creatively named, results
.
def get_result(result):
global results
results.append(result)
Let’s run this code in serial (non-parallel) and see how long it takes. Set up an array with 3 columns of random numbers between 0 and 100. These are the parameters that will get passed to my_function
. Then create the empty results
list. Finally, loop through all the rows in params
and add the result from my_function
to results
. Time this to see how long it takes (should be about 20 seconds) and print out the results
list.
if __name__ == '__main__': params = np.random.random((10, 3)) * 100.0 results = [] ts = time.time() for i in range(0, params.shape[0]): get_result(my_function(i, params[i, 0], params[i, 1], params[i, 2])) print('Time in serial:', time.time() - ts) print(results)
As expected, this code took about 20 seconds to run. Also, notice how the results were returned in order.
Time in serial: 20.006245374679565
[(0, 452994.2250955602), (1, 12318.873058254741), (2, 310577.72144939064), (3, 210071.48540466625), (4, 100467.02727256044), (5, 46553.87276610058), (6, 11396.808561138329), (7, 543909.2528728382), (8, 79957.52205218966), (9, 47914.9078853125)]
Run in Parallel
Now use multiprocessing
to run the same code in parallel. Simply add the following code directly below the serial code for comparison. A gist with the full Python script is included at the end of this article for clarity.
Reset the results
list so it is empty, and reset the starting time. We’ll need to specify how many CPU processes we want to use. multiprocessing.cpu_count()
returns the total available processes for your machine. Then loop through each row of params
and use multiprocessing.Pool.apply_async
to call my_function
and save the result. Parameters to my_function
are passed using the args
argument of apply_async
and the callback function is where the result of my_function
is sent. This will start a new process as soon as one is available, and continue doing so until the loop is complete. Then close the process pool. multiprocessing.Pool.join()
waits to execute any following code until all process have completed running. Now print the time this code took to run and the results.
results = [] ts = time.time() pool = mp.Pool(mp.cpu_count()) for i in range(0, params.shape[0]): pool.apply_async(my_function, args=(i, params[i, 0], params[i, 1], params[i, 2]), callback=get_result) pool.close() pool.join() print('Time in parallel:', time.time() - ts) print(results)
Notice, using apply_async
decreased the run-time from 20 seconds to under 5 seconds. Also, notice that the results were not returned in order. That is why the row index was passed and returned.
Time in parallel: 4.749683141708374
[(0, 452994.2250955602), (2, 310577.72144939064), (1, 12318.873058254741), (3, 210071.48540466625), (4, 100467.02727256044), (5, 46553.87276610058), (6, 11396.808561138329), (7, 543909.2528728382), (9, 47914.9078853125), (8, 79957.52205218966)]
Conclusion
Implementing asynchronous parallelization to your code can greatly decrease your run time. The multiprocessing
module is a great option to use for parallelization on personal computers. As you’ve seen in this article you can get dramatic speed increases, depending on your machine’s specs. Beware that multiprocessing
has limitations if you eventually want to scale up to a super computer. If super computing is where you’re headed, you’ll want to use a parallelization model compatible with Message Passing Interface (MPI).
https://gist.github.com/konradhafen/aa605c67bf798f07244bdc9d5d95ad12