Getting started must see: using Python to realize multitasking process

Thread, process comparison and process examples in Python

1, Process introduction

Process: the program being executed, which is composed of program, data and process control blocks. It is the program being executed. It is an execution process of the program and the basic unit of resource scheduling.

Program: the code that is not executed is static.

2, Comparison between threads and processes

It can be seen from the figure that the computer has 9 application processes at this time, but one process will correspond to multiple threads. It can be concluded that:

Process: it can complete multiple tasks and run multiple QQ S on one computer at the same time

Thread: it can complete multiple tasks and multiple chat windows in a QQ

Fundamental difference: process is the basic unit of operating system resource allocation, while thread is the basic unit of task scheduling and execution

Advantages of using multiple processes:

1. Have an independent GIL:

Firstly, due to the existence of GIL in the process, multithreading in Python can not give full play to the multi-core advantage. Multiple threads in a process can only have one thread running at the same time. For multi-process, each process has its own GIL. Therefore, under multi-core processor, the operation of multi-process will not be affected by GIL. Therefore, multi-process can give better play to the advantages of multi-core.

2. High efficiency

Of course, for IO intensive tasks such as crawlers, the impact of multithreading and multiprocessing is not very different. For computing intensive tasks, Python's multi-core operation efficiency will be doubled compared with multithreading.

3, Python implements multiple processes

Let's start with an example:

1. Using the process class

import multiprocessing 
def process(index):
print(f'Process: {index}')
if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=process, args=(i,))p.start

This is the most basic way to realize multi process: create a new sub process by creating a process. The target parameter is passed in the method name, and args is the parameter of the method, which is passed in the form of tuples, which corresponds to the parameters of the called method process one by one.

Note: args here must be a tuple. If there is only one parameter, add a comma after the first element of the tuple. If there is no comma, it is no different from the single element itself and cannot form a tuple, resulting in problems in parameter transmission. After creating the process, we can start the process by calling the start method.

The operation results are as follows:

Process: 0 
Process: 1
Process: 2
Process: 3
Process: 4

As you can see, we run five sub processes, and each process calls the process method. The index parameter of the process method is passed in through the args of the process, which are the five sequence numbers of 0 ~ 4. Finally, it is printed out, and the operation of the five sub processes ends.

2. Inherit process class

from multiprocessing import Process
import timeclass MyProcess(Process):
def __init__(self,loop):
Process.__init__(self)
self.loop = loop
def run(self):
for count in range(self.loop):
time.sleep(1)
print(f'Pid:{self.pid} LoopCount: {count}')
if __name__ == '__main__':
for i in range(2,5):
p = MyProcess(i)
p.start

We first declare a construction method, which receives a loop parameter representing the number of cycles and sets it as a global variable. In the run method, the loop variable is used to loop times, and the current process number and loop times are printed.

At the time of invocation, we obtained 2, 3, 4 three figures by range method, and initialized the MyProcess process separately, then called the start method to start the process.

Note: the execution logic of the process here needs to be implemented in the run method. To start the process, you need to call the start method. After calling, the run method will execute.

The operation results are as follows:

Pid:12976 LoopCount: 0
Pid:15012 LoopCount: 0
Pid:11976 LoopCount: 0
Pid:12976 LoopCount: 1
Pid:15012 LoopCount: 1
Pid:11976 LoopCount: 1
Pid:15012 LoopCount: 2
Pid:11976 LoopCount: 2
Pid:11976 LoopCount: 3

Note that the process pid here represents the process number, and the running results may be different for different machines and at different times.

4, Communication between processes

1. Queue queue first in first out

from multiprocessing import Queue
import multiprocessing
def download(p): # Download data
lst = [11,22,33,44]
for item in lst:
p.put(item)print('The data has been downloaded successfully....')
def savedata(p):
lst = 
while True:
data = p.getlst.append(data)if p.empty:
breakprint(lst)def main:p1 = Queuet1 = multiprocessing.Process(target=download,args=(p1,))t2 = multiprocessing.Process(target=savedata,args=(p1,))t1.startt2.startif __name__ == '__main__':
main
The data has been downloaded successfully....
[11, 22, 33, 44]

2. Shared global variables are not suitable for multiprocess programming

import multiprocessing
a = 1
def demo1:
global a
a += 1
def demo2:
print(a)def main:
t1 = multiprocessing.Process(target=demo1)t2 = multiprocessing.Process(target=demo2)t1.startt2.startif __name__ == '__main__':
main

Operation results:

1

The results show that: global variables are not shared;

5, Communication between process pools

1. Process pool import

When the number of sub processes to be created is small, you can directly use the Process in multiprocessing to dynamically generate multiple processes. However, if there are hundreds or even thousands of goals, the workload of manually creating processes is huge. At this time, you can use the Pool method provided by the multiprocessing module.

from multiprocessing import Pool
import os,time,random
def worker(a):t_start = time.time
print('%s Start execution,Process number is%d'%(a,os.getpid))
time.sleep(random.random*2)
t_stop = time.time
print(a,"Execution complete,time consuming%0.2f"%(t_stop-t_start))
if __name__ == '__main__':
po = Pool(3) # Define a process pool
for i in range(0,10):
po.apply_async(worker,(i,)) # Add the worker's task print("--start --") to the process pool
po.closepo.joinprint("--end--")

Operation results:

--start--
0 Start execution,The process number is 6664
1 Start execution,Process number 47722 starts execution,The process number is 132560, and the execution is completed,Time consuming 0.18
3 Start execution,The process number is 6664
2 Execution complete,Time consuming 0.16
4 Start execution,The process number is 13256
1 Execution complete,Time consuming 0.67
5 Start execution,The process number is 4772
4 Execution complete,Time consuming 0.87
6 Start execution,The process number is 13256
3 Execution complete,Time consuming 1.59
7 Start execution,The process number is 6664
5 Execution complete,Time consuming 1.15
8 Start execution,The process number is 4772
7 Execution complete,Time consuming 0.40
9 Start execution,The process number is 6664
6 Execution complete,Time consuming 1.80
8 Execution complete,Time consuming 1.49
9 Execution complete,Time consuming 1.36
--end--

A process pool can only hold three processes. New tasks can be added only after the execution is completed. It goes back and forth in the process of continuous opening and release.

6, Case: batch copy of files

Operation idea:

  • Gets the name of the folder to copy

  • Create a new folder

  • Get all the file names to be copied in the folder

  • Create process pool

  • Add task to process pool

The code is as follows:

Guide Package

import multiprocessing
import osimport time

Custom file copy function

def copy_file(Q,oldfolderName,newfolderName,file_name):
# For file copying, you do not need to return time.sleep(0.5)
# Print ('\ rcopy% s file'% (oldfolderName,newfolderName,file_name),end = '' from% s folder to% s folder)
old_file = open(oldfolderName + '/' + file_name,'rb') # Files to be copied
content = old_file.readold_file.closenew_file = open(newfolderName + '/' + file_name,'wb') # New file copied out
new_file.write(content)
new_file.closeQ.put(file_name) # Add file to Q queue

Define main function

def main:
oldfolderName = input('Please enter the name of the folder to copy:') # Step 1 get the name of the folder to be copied (you can create it manually or through code, here we create it manually)
newfolderName = oldfolderName + 'Copy'
# Step 2: create a new folder if not os.path.exists(newfolderName):
os.mkdir(newfolderName)
filenames = os.listdir(oldfolderName) # 3. Get all the file names to be copied in the folder
# print(filenames)
pool = multiprocessing.Pool(5) # 4. Create process pool
Q = multiprocessing.Manager.Queue # Create a queue for communication for file_name in filenames:
pool.apply_async(copy_file,args=(Q,oldfolderName,newfolderName,file_name)) # 5. Add tasks to the process pool
po.closecopy_file_num = 0
file_count = len(filenames)
# I don't know when to complete, so I define an endless loop while True:
file_name = Q.getcopy_file_num += 1
time.sleep(0.2)
print('\r Copy progress%.2f %%'%(copy_file_num * 100/file_count),end='') # Make a copy progress bar
if copy_file_num >= file_count:
break

Program running

if __name__ == '__main__':
main

The operation results are shown in the figure below:

Comparison of file directory structure before and after operation

Before operation

After operation

  Finally, I wish you progress every day!! The most important thing to learn Python is mentality. We are bound to encounter many problems in the process of learning. We may not be able to solve them if we want to break our head. This is normal. Don't rush to deny yourself and doubt yourself. If you have difficulties in learning at the beginning and want to find a python learning and communication environment, you can join us[ python skirt ], receiving learning materials and discussing together will save a lot of time and reduce many problems.

Tags: Python multiple processes

Posted on Wed, 01 Dec 2021 00:32:32 -0500 by thinkgfx