Python - multithreading Parallel / Multiprocessing Demo

1, Introduction

Multithreading and thread pool are commonly used in Java development to improve program running efficiency and machine utilization. Python multithreading uses Parallel class and Multiprocessing class. In addition, there are  _ Thread, threading and many other thread related classes can cooperate with os, sys, subprocess and other tool classes to realize complex operations. The following Demo introduces several multithreading implementation methods through the example of sum.

2, Parallel no Lock  

Parallel under the joblib library implements the function of improving efficiency in parallel. The function to be executed by multiple threads is executed at the delayed position. njobs specifies the number of parallel core s, and the parenthesis at the end of delayed is responsible for passing the function add_ Parameters used by sum.

# -*- coding: UTF-8 -*-

from joblib import Parallel, delayed
import numpy as np

def add_sum(_numList):
    return sum(_numList)

if __name__ == '__main__':
    numList = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7]]
    cores = 4
    results = Parallel(n_jobs=cores)(delayed(add_sum)(num) for num in numList)

3, MultiProcessing without Lock

Thread pool is commonly used in Java, mainly using   ThreadPoolExecutor class development with Python    multiprocessing can also implement thread pooling.

# -*- coding: UTF-8 -*-

import multiprocessing

def add_sum(nums):
    return sum(nums)

if __name__ == '__main__':

    pool = multiprocessing.Pool(processes=2)
    numLists = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7]]
    sub_process = []

    for i in range(len(numLists)):
        data = numLists[i]
        process = pool.apply_async(add_sum, (data,))

    result = 0
    for process in sub_process:
        result += process.get()


->The. Pool method can implement the thread pool, and processes represents the parallelism of the thread pool
-> apply_ Async implements asynchronous submission. If you pass only one parameter, you need to add a comma after the parameter, otherwise the function will recognize an error

->. close closes the thread pool. The. join method waits for all tasks in the thread pool to finish executing and starts executing the following code, so avoid writing dead loops and bugs

->. get can get the results returned after the thread processing. You need to aggregate the results of multiple threads yourself

4, MultiProcessing has   Lock

In addition to parallel execution, multithreading is often designed to solve the problem of synchronization in java   synchronized locking controls the reading and writing of unified memory variables. multiprocessing implements locking through lock class to solve the problem of thread synchronization.

# -*- coding: UTF-8 -*-

import multiprocessing
import time
import numpy as np
import sys

def add_sum(_initial, _numList, _lock):
    while True:
            value = _numList.__next__()
            _initial.value = _initial.value + value

        except StopIteration:

if __name__ == "__main__":
    lock = multiprocessing.Lock()
    initial = multiprocessing.Value('i', 0)
    work_nums = 4
    sub_process = []  # A collection of processes that process data
    numList = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7]]

    st = time.time()
    for i in range(work_nums):
        sub_process_tmp = multiprocessing.Process(target=add_sum, args=(initial, iter(numList[i]), lock))

    for process in sub_process:

    for process in sub_process:

    end = time.time()

    # Comparison of results
    print("Add Sum Cost :", (end - st))
    realSum = np.array(numList).sum()
    print("Real Result: ", realSum, " Multi Process Result: ", initial.value)


->    lock = multiprocessing.Lock() obtains the lock of the program running, which can be used in the thread

->     initial = multiprocessing.Value('i', 0) represents a global variable, similar to spark's   longAccumulator, whose value can be obtained through the. Value method, 'i' represents the type, generally including the following types:

TypeC TypePython TypeMinimum bytes
bsigned charint1
Bunsigned charint1
uunicodeunicode char2
hsigned shortint2
Hunsigned shortint2
isigned intint2
Iunsigned intlong2
lsigned longint4
Lunsigned longlong4

If you share a string, you can execute the following code to c_char_p as the type indication of shared variable Value, you can check ctypes. Since numerical accumulation is common, it will not be repeated here

  from ctypes import c_char_p

->   target is the objective function executed by multiple threads, args is the parameter corresponding to the objective function, and tasks can be evenly allocated to each thread through the parameter

->Start starts the thread, which is similar to the Runnable class of java. join waits for the thread to finish running

->Lock has acquire() acquisition and release() release methods, and the synchronization logic can be placed between them to ensure the consistency of the results

->Remember to add the lock.release() method in excpet, otherwise the lock cannot be released and the program will get stuck

Operation result without lock:

Add Sum Cost : 0.1731090545654297
Real Result:  64  Multi Process Result:  39

Operation result with lock:

Add Sum Cost : 0.18215608596801758
Real Result:  64  Multi Process Result:  64

The program running results in the unlocked state are inconsistent, which may be 39, the correct result 64, or other values. The values after locking are consistent.

5, Summary

Through the simple demo of array summation, various schemes of lock parallel and lock free parallel are realized. The above code can be modified according to your own needs. For more specific parameters and implementation details, please refer to the official Api explanation.

Tags: Python Multithreading lock

Posted on Wed, 20 Oct 2021 22:43:00 -0400 by peanutbutter