| |

Asynchronous Parallel Programming in Python with Multiprocessing

A flexible method to speed up code on a personal computer

Do you wish your Python scripts could run faster? Maybe they can. And you won’t (probably) have to buy a new computer, or use a super computer. Most modern computers contain multiple processing cores but, by default, python scripts only use a single core. Writing code can run on multiple processors can really decrease your processing time. This article will demonstrate how to use the multiprocessing module to write parallel code that uses all of your machines processors and gives your script a performance boost.

Synchronous vs Asynchronous Models

An asynchronous model starts tasks as soon as new resources become available without waiting for previously running tasks to finish. By contrast, a synchronous model waits for task 1 to finish before starting task 2. For a more detailed explanation with examples, check out this article in The Startup. Asynchronous models often offer the greatest opportunity for performance improvement, if you can structure your code in the proper manner. That is, tasks can run independently of one another.

For the sake of brevity, this article is going to focus solely on asynchronous parallelization because that is the method that will likely boost performance the most. Also, if you structure code for asynchronous parallelization on your laptop, it is much easier to scale up to a super computer.

Installation

Since Python 2.6 multiprocessing has been included as a basic module, so no installation is required. Simply import multiprocessing. Since ‘multiprocessing’ takes a bit to type I prefer to import multiprocessing as mp.

The Problem

We have an array of parameter values that we want to use in a sensitivity analysis. The function we’re running the analysis on is computationally expensive. We can cut down on processing time by running multiple parameter simultaneously in parallel.

Setup

Import multiprocessing , numpy and time. Then define a function that takes a row number, i , and three parameters as inputs. The row number is necessary so results can later be linked to the input parameters. Remember, the asynchronous model does not preserve order.

import multiprocessing as mp
import numpy as np
import time

def my_function(i, param1, param2, param3):
    result = param1 ** 2 * param2 + param3
    time.sleep(2)
    return (i, result)

For demonstrative purposes, this is a simple function that is not computationally expensive. I’ve added a line of code to pause the function for 2 seconds, simulating a long run-time. The function output is going to be most sensitive to param1 and least sensitive to param3. In practice, you can replace this with any function.

We need a function that can take the result of my_function and add it to a results list, which is creatively named, results.

def get_result(result):
global results
results.append(result)

Let’s run this code in serial (non-parallel) and see how long it takes. Set up an array with 3 columns of random numbers between 0 and 100. These are the parameters that will get passed to my_function. Then create the empty results list. Finally, loop through all the rows in params and add the result from my_function to results. Time this to see how long it takes (should be about 20 seconds) and print out the results list.

if __name__ == '__main__':
    params = np.random.random((10, 3)) * 100.0
    results = []
    ts = time.time()
    for i in range(0, params.shape[0]):
        get_result(my_function(i, params[i, 0], params[i, 1], params[i, 2]))
    print('Time in serial:', time.time() - ts)
    print(results)

As expected, this code took about 20 seconds to run. Also, notice how the results were returned in order.

Time in serial: 20.006245374679565
[(0, 452994.2250955602), (1, 12318.873058254741), (2, 310577.72144939064), (3, 210071.48540466625), (4, 100467.02727256044), (5, 46553.87276610058), (6, 11396.808561138329), (7, 543909.2528728382), (8, 79957.52205218966), (9, 47914.9078853125)]

Run in Parallel

Now use multiprocessing to run the same code in parallel. Simply add the following code directly below the serial code for comparison. A gist with the full Python script is included at the end of this article for clarity.

Reset the results list so it is empty, and reset the starting time. We’ll need to specify how many CPU processes we want to use. multiprocessing.cpu_count() returns the total available processes for your machine. Then loop through each row of params and use multiprocessing.Pool.apply_async to call my_function and save the result. Parameters to my_function are passed using the args argument of apply_async and the callback function is where the result of my_function is sent. This will start a new process as soon as one is available, and continue doing so until the loop is complete. Then close the process pool. multiprocessing.Pool.join() waits to execute any following code until all process have completed running. Now print the time this code took to run and the results.

results = []
ts = time.time()
pool = mp.Pool(mp.cpu_count())
for i in range(0, params.shape[0]):
    pool.apply_async(my_function, args=(i, params[i, 0], params[i, 1], params[i, 2]), callback=get_result)
pool.close()
pool.join()
print('Time in parallel:', time.time() - ts)
print(results)

Notice, using apply_async decreased the run-time from 20 seconds to under 5 seconds. Also, notice that the results were not returned in order. That is why the row index was passed and returned.

Time in parallel: 4.749683141708374
[(0, 452994.2250955602), (2, 310577.72144939064), (1, 12318.873058254741), (3, 210071.48540466625), (4, 100467.02727256044), (5, 46553.87276610058), (6, 11396.808561138329), (7, 543909.2528728382), (9, 47914.9078853125), (8, 79957.52205218966)]

Conclusion

Implementing asynchronous parallelization to your code can greatly decrease your run time. The multiprocessing module is a great option to use for parallelization on personal computers. As you’ve seen in this article you can get dramatic speed increases, depending on your machine’s specs. Beware that multiprocessing has limitations if you eventually want to scale up to a super computer. If super computing is where you’re headed, you’ll want to use a parallelization model compatible with Message Passing Interface (MPI).

https://gist.github.com/konradhafen/aa605c67bf798f07244bdc9d5d95ad12

Similar Posts