Python syntax - multiprocess, multithreading, coroutine (asynchronous IO)

Related concepts Concurrency and p...
Concurrency and parallelism
Synchronous and asynchronous
Blocking and non blocking
CPU intensive and I/O intensive
Comparison of multiprocess, multithread and multiprocess
Simple example
Add result callback
asyncio.wait and asyncio.gather
Combination of coroutine and multithreading
Multithreading
Multi process
Sequential serial execution
Multi process
Multithreading
asyncio
asyncio+uvloop
Run time comparison
Related concepts

Concurrency and parallelism

  • Concurrency: refers to the number of programs that can run on a CPU (CPU core) in a time period.
  • Parallelism: refers to running multiple programs on multiple CPUs at the same time, which is related to the number of CPUs (CPU cores).

because

The computer CPU (CPU core) can only run one program at a time.

Synchronous and asynchronous

  • Synchronization means that when code is called, it must wait for execution to complete before executing the remaining logic.
  • Asynchrony means that the code directly executes the remaining logic without waiting for the operation to be completed.

Blocking and non blocking

  • Blocking means that the current thread is suspended when a function is called.
  • Non blocking means that when a function is called, the current thread will not be suspended, but will return immediately.

CPU intensive and I/O intensive

CPU bound:

CPU intensive is also called computing intensive, which means that I/O can be completed in a very short time. The CPU needs a lot of calculation and processing. It is characterized by high CPU occupation.

For example: compression and decompression, encryption and decryption, regular expression search.

IO intensive (I/O-bound):

IO intensive refers to the read-write operation in which the CPU is waiting for IO operation (hard disk / memory) most of the time when the system is running. It is characterized by low CPU occupation.

For example: file read / write, web crawler, database read / write.

Comparison of multiprocess, multithread and multiprocess

type

advantage

shortcoming

apply

Multiprocessing

CPU can be used for multi-core parallel operation

The maximum number of resources that can be started is less than that of threads

CPU intensive computing

Multithreading (threading)

Compared with the process, it is lighter and occupies less resources

Compared with processes, multithreading can only execute concurrently and cannot use multiple CPU s (gils). Compared with processes, the number of coprocess starts is limited, which occupies memory resources and has thread switching overhead

IO intensive computing and few simultaneous tasks are required

Multi process Coroutine(asyncio)

The memory overhead is the least and the number of startup processes is the most

The restriction code implementation of support library is complex

IO intensive computing, many tasks running at the same time

GIL full name: Global Interpreter Lock

The following figure shows the operation of GIL

Python multithreading is pseudo multithreading, and only one thread can run at the same time.

A process can start N threads, and the number is limited by the system.

A thread can start N processes, and the number is unlimited.

How to choose

For other languages, multithreading can use multiple CPUs (cores) at the same time, so it is suitable for CPU intensive computing. However, Python can only use IO intensive computing due to the limitation of GIL. So for Python:

  • For IO intensive systems, if you can use multiple processes, you can use multiple processes. Multithreading is used only if there is no library support.
  • For CPU intensive, you can only use multiple processes.
Co process (asynchronous IO)

Simple example

import asyncio async def test(): await asyncio.sleep(3) return "123" async def main(): result = await test() print(result) if __name__ == '__main__': asyncio.run(main())

Add result callback

import threading import asyncio async def myfun(index): print(f'[]()') await asyncio.sleep(1) return index def getfuture(future): print(future.result()) loop = asyncio.get_event_loop() tasks = [] for item in range(3): future = asyncio.ensure_future(myfun(item)) tasks.append(future) future.add_done_callback(getfuture) loop.run_until_complete(asyncio.wait(tasks)) loop.close()

asyncio.wait and asyncio.gather

import threading import asyncio async def myfun(index): print(f'[]()') await asyncio.sleep(1) loop = asyncio.get_event_loop() tasks = [myfun(1), myfun(2)] loop.run_until_complete(asyncio.wait(tasks)) #loop.run_until_complete(asyncio.gather(*tasks)) loop.close()

Differences between asyncio.gather and asyncio.wait:

Internally, wait() uses a set to save the Task instance it creates. Because set is out of order, this is why our tasks are not executed in sequence. The return value of wait is a tuple, including two sets, representing completed and unfinished tasks respectively. The second parameter of wait is a timeout value After this timeout is reached, the status of unfinished tasks changes to pending. When the program exits, there are still tasks that have not been completed. At this time, you will see the following error prompt.

Use of gather The function of gather is similar to that of wait.

  1. The gather task cannot be cancelled.
  2. The return value is a list of results
  3. It can be output in the order of the incoming parameters.

Combination of coroutine and multithreading

Multiple requests simultaneously

import asyncio import time from concurrent.futures import ThreadPoolExecutor import requests def myquery(url): r = requests.get(url) print(r.text) return r.text if __name__ == "__main__": loop = asyncio.get_event_loop() executor = ThreadPoolExecutor(3) urls = ["https://www.psvmc.cn/userlist.json", "https://www.psvmc.cn/login.json"] tasks = [] start_time = time.time() for url in urls: task = loop.run_in_executor(executor, myquery, url) tasks.append(task) loop.run_until_complete(asyncio.wait(tasks)) print(f"Time use")

result

{"code":0,"msg":"success","obj":{"name":"Xiao Ming","sex":"male","token":"psvmc"}} {"code":0,"msg":"success","obj":[{"name":"Xiao Ming","sex":"male"},{"name":"Xiao Hong","sex":"female"},{"name":"Xiao Gang","sex":"unknown"}]} Time 0.11207175254821777

Add callback to a single request

import asyncio import threading import time from concurrent.futures import ThreadPoolExecutor import requests def myquery(url): print(f"Request thread:") r = requests.get(url) return r.text def myfuture(future): print(f"Callback thread:") print(future.result()) if __name__ == "__main__": loop = asyncio.get_event_loop() executor = ThreadPoolExecutor(3) url = "https://www.psvmc.cn/userlist.json" tasks = [] start_time = time.time() task = loop.run_in_executor(executor, myquery, url) future = asyncio.ensure_future(task) future.add_done_callback(myfuture) loop.run_until_complete(future) print(f"Time use")
Multithreading and multiprocessing

Multithreading

Reference module

from threading import Thread def func(num): return num t = Thread(target=func, args=(100,)) t.start() t.join()

data communication

import queue q = queue.Queue() q.put(1) item = q.get()

lock

from threading import Lock lock = Lock() with lock: pass

Pool technology

from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor() as executor: # Method 1 results = executor.map(func, [1, 2, 3]) # Method 2 future = executor.submit(func, 1) result = future.result()
Example
from concurrent.futures import ThreadPoolExecutor import threading import time # Define a function ready to be a thread task def action(num): print(threading.current_thread().name) time.sleep(num) return num + 100 if __name__ == "__main__": # Create a thread pool with 3 threads with ThreadPoolExecutor(max_workers=3) as pool: future1 = pool.submit(action, 3) future1.result() print(f"Single task return:") print('------------------------------') # Using threads to perform map calculations results = pool.map(action, (1, 3, 5)) for r in results: print(f"Multiple task returns:")

result

ThreadPoolExecutor-0_0 Single task return:103 ------------------------------ ThreadPoolExecutor-0_0 ThreadPoolExecutor-0_1 ThreadPoolExecutor-0_2 Multiple task returns:101 Multiple task returns:103 Multiple task returns:105

Multi process

Reference module

from multiprocessing import Process def func(num): return num t = Process(target=func, args=(100,)) t.start() t.join()

data communication

import multiprocessing q = multiprocessing.Queue() q.put(1) item = q.get()

lock

from multiprocessing import Lock lock = Lock() with lock: pass

Pool technology

from concurrent.futures import ProcessPoolExecutor with ProcessPoolExecutor() as executor: # Method 1 results = executor.map(func, [1, 2, 3]) # Method 2 future = executor.submit(func, 1) result = future.result()
Example
from concurrent.futures import ProcessPoolExecutor import multiprocessing import time # Define a function to prepare as a process task def action(num): print(multiprocessing.current_process().name) time.sleep(num) return num + 100 if __name__ == "__main__": # Create a process pool with 3 processes with ProcessPoolExecutor(max_workers=3) as pool: future1 = pool.submit(action, 3) future1.result() print(f"Single task return:") print('------------------------------') # Using threads to perform map calculations results = pool.map(action, [1, 3, 5]) for r in results: print(f"Multiple task returns:")

result

SpawnProcess-1 Single task return:103 ------------------------------ SpawnProcess-2 SpawnProcess-3 SpawnProcess-1 Multiple task returns:101 Multiple task returns:103 Multiple task returns:105
Multiprocess / multithread / coroutine comparison

Asynchronous IO, multiprocessing, multithreading

The waiting time of CPU for IO intensive applications is much longer than the running time of CPU itself, which is too wasteful;

Common IO intensive services include browser interaction, disk request, web crawler, database request, etc

There are three ways to improve concurrency in IO intensive scenarios in Python world: multi process, multi thread and multi co process;

Theoretically, asyncio has the highest performance for the following reasons:

  1. Processes and threads will have CPU context switching
  2. Processes and threads need the interaction between kernel state and user state, and the performance overhead is large; The coroutine is transparent to the kernel and only runs in user mode
  3. Processes and threads cannot be created indefinitely. The best practice is generally CPU*2; The concurrency capability of the co process is strong, and the upper limit of concurrency theoretically depends on the limit of the file descriptor that can be registered by the operating system IO multiplexing (epoll under Linux)

Is the actual performance of asyncio as strong as that in theory? How strong is it? I built the following test scenarios:

Request 10 and sleep 1s simulate the service query

  • Method 1; Sequential serial execution
  • Method 2: multi process
  • Method 3: multithreading
  • Method 4: asyncio
  • Method 5: asyncio+uvloop

The biggest difference between the final asyncio+uvloop and the official asyncio is that the event loop part of asyncio is re implemented with Python + libuv,

The official test performance is twice that of node.js, which is the same as that of golang.

Sequential serial execution

import time def query(num): print(num) time.sleep(1) def main(): for h in range(10): query(h) # main entrance if __name__ == '__main__': start_time = time.perf_counter() main() end_time = time.perf_counter() print(f"Time difference:")

Multi process

from concurrent import futures import time def query(num): print(num) time.sleep(1) def main(): with futures.ProcessPoolExecutor() as executor: for future in executor.map(query, range(10)): pass # main entrance if __name__ == '__main__': start_time = time.perf_counter() main() end_time = time.perf_counter() print(f"Time difference:")

Multithreading

from concurrent import futures import time def query(num): print(num) time.sleep(1) def main(): with futures.ThreadPoolExecutor() as executor: for future in executor.map(query, range(10)): pass # main entrance if __name__ == '__main__': start_time = time.perf_counter() main() end_time = time.perf_counter() print(f"Time difference:")

asyncio

import asyncio import time async def query(num): print(num) await asyncio.sleep(1) async def main(): tasks = [asyncio.create_task(query(num)) for num in range(10)] await asyncio.gather(*tasks) # main entrance if __name__ == '__main__': start_time = time.perf_counter() asyncio.run(main()) end_time = time.perf_counter() print(f"Time difference:")

asyncio+uvloop

be careful

uvloop is not supported on Windows.

Example

import asyncio import uvloop import time async def query(num): print(num) await asyncio.sleep(1) async def main(): tasks = [asyncio.create_task(query(host)) for host in range(10)] await asyncio.gather(*tasks) # main entrance if __name__ == '__main__': uvloop.install() start_time = time.perf_counter() asyncio.run(main()) end_time = time.perf_counter() print(f"Time difference:")

Run time comparison

mode

Running time

serial

10.0750972s

Multi process

1.1638731999999998s

Multithreading

1.0146456s

asyncio

1.0110082s

asyncio+uvloop

1.01s

It can be seen that both multi process, multi thread and asyncio can greatly improve the concurrency in IO intensive scenarios, but asyncio+uvloop has the highest performance!

28 November 2021, 21:02 | Views: 1788

Add new comment

For adding a comment, please log in
or create account

0 comments