One day, while troubleshooting online problems, some "weird" exceptions were found after the dump thread:
File "/usr/local/lib/python3.5/logging/__init__.py", line 1838, in info root.info(msg, *args, **kwargs) File "/usr/local/lib/python3.5/logging/__init__.py", line 1271, in info Log 'msg % args' with severity 'INFO'. File "/usr/local/lib/python3.5/logging/__init__.py", line 1279, in info self._log(INFO, msg, args, **kwargs) File "/usr/local/lib/python3.5/logging/__init__.py", line 1415, in _log self.handle(record) File "/usr/local/lib/python3.5/logging/__init__.py", line 1425, in handle self.callHandlers(record) File "/usr/local/lib/python3.5/logging/__init__.py", line 1487, in callHandlers hdlr.handle(record) File "/usr/local/lib/python3.5/logging/__init__.py", line 853, in handle self.acquire() File "/usr/local/lib/python3.5/logging/__init__.py", line 804, in acquire self.lock.acquire()
The abnormal performance is that the logging module is stuck in the place of obtaining the lock when logging, and is always blocked. As we all know, Python's logging module is thread safe, because the logging module must obtain a lock when outputting logs, and this lock is a reentrant lock. For different threads, they must wait for this lock to become available before logging. The most typical example is as follows:
def handle(self, record): """ Conditionally emit the specified logging record. Emission depends on filters which may have been added to the handler. Wrap the actual emission of the record with acquisition/release of the I/O thread lock. Returns whether the filter passed the record for emission. """ rv = self.filter(record) if rv: self.acquire() try: self.emit(record) finally: self.release() return rv
It can be seen from the above that if one thread fails to release the lock when it obtains the lock, the thread will always block the step of obtaining the lock, but in fact, this situation is basically impossible, because the output log is a very fast process and does not take too much time.
Because the environment runs in a multi process + multi thread environment, it is reasonable to suspect that the problem is related to the race state during concurrency, so first write a multi process example:
import logging import multiprocessing import sys from time import sleep class CustomStreamHandler(logging.StreamHandler): def emit(self, record): sleep(0.1) super(CustomStreamHandler, self).emit(record) root = logging.getLogger() root.setLevel(logging.DEBUG) root.addHandler(CustomStreamHandler(sys.stdout)) def g(): logging.info(2) logging.info(2) def f(): logging.info(1) logging.info(1) p = multiprocessing.Process(target=f) p.start() g()
The main process starts a new process, calls the f function to print the log "2", then calls the g function in parallel to print the log "1". In order to see the problem clearly, the logging handler will sleep for 0.1 seconds after obtaining the lock.
You can see that there is no problem with the output:
1 2 1 2
Now make some changes to change the call of the g function of the main thread to create a new thread in the main process to execute:
import logging import multiprocessing import sys from time import sleep class CustomStreamHandler(logging.StreamHandler): def emit(self, record): sleep(0.1) super(CustomStreamHandler, self).emit(record) root = logging.getLogger() root.setLevel(logging.DEBUG) root.addHandler(CustomStreamHandler(sys.stdout)) def g(): logging.info(2) logging.info(2) def f(): logging.info(1) logging.info(1) import threading t = threading.Thread(target=g) p = multiprocessing.Process(target=f) t.start() p.start()
Execute the output, and it is blocked immediately after the program outputs two 2:
2 2 block...
Why do two similar examples have different results?
Carefully observe that in this example, if the thread starts calling before the process, it is assumed that the thread executes first. In other words, at this time, the lock of the root logger is held by the thread pool. According to the logic, the thread immediately enters sleep for 0.1 seconds. At this time, the process also wants to output the log. It also wants to obtain this lock, but it can't get it, The process is blocked even until the thread exits.
The actual reason behind all these appearances can actually be pushed and knocked out from above. When the process is fork ed, the thread actually holds the lock. Therefore, after copying the address space, the child process thinks that the lock is still occupied, so it has been waiting, but it doesn't know that the lock will never be released
Why is there no exception in the previous example? In fact, according to the above, this lock is a reentrant lock (RLock), which belongs to the thread level lock, that is, the lock must be released in which thread is acquire d. Let's look at the first example. Although there is a child process in the main process, in fact, the threads of the two processes are the same (the main thread), In the second example, the F and g functions are in completely different threads, that is, f wants to release them Second hand QQ selling platform There is nothing it can do about this lock, because it is created by g and can only be released by g. moreover, due to different threads, g cannot release the lock.
What would happen if you made a opportunistic move and replaced the lock:
import logging import multiprocessing import sys from time import sleep import threading class CustomStreamHandler(logging.StreamHandler): def emit(self, record): sleep(0.1) super(CustomStreamHandler, self).emit(record) root = logging.getLogger() root.setLevel(logging.DEBUG) root.addHandler(CustomStreamHandler(sys.stdout)) def g(): print("g", threading.get_ident()) handler = logging.getLogger().handlers[] logging.info(2) logging.info(2) print(id(handler)) print(id(handler.lock)) def f(): print("f", threading.get_ident()) handler = logging.getLogger().handlers[] handler.createLock() logging.info(1) logging.info(1) print(id(handler)) print(id(handler.lock)) import threading print("main", threading.get_ident()) p = multiprocessing.Process(target=f) t = threading.Thread(target=g) t.start() p.start()
Execute this code and output:
main 140735977362240 g 123145405829120 f 140735977362240 2 1 2 1 4353914808 4353914808 4354039648 4349383520
It will be found that there will be no blocking, because the lock in the locked state is directly replaced in the sub process (a new RLock). At this time, the addresses of handler and lock are output. It is found that the handler addresses in the two functions are the same, and the locks are already two different (copy on write).
Solution
To solve this problem, Google has given a solution long ago: Python at fork
The main idea is: make a monkeypatch for os.fork, and three custom hook s can be triggered when fork is called:
def atfork(prepare=None, parent=None, child=None):
- prepare: the function triggered before the fork subprocess.
- Parent: function triggered by parent process after fork child process.
- Child: the function triggered by the child process after fork ing the child process.
There is also a workaround for the deadlock of the logging module:
def fix_logging_module(): logging = sys.modules.get('logging') # Prevent fixing multiple times as that would cause a deadlock. if logging and getattr(logging, 'fixed_for_atfork', None): return if logging: warnings.warn('logging module already imported before fixup.') import logging if logging.getLogger().handlers: # We could register each lock with atfork for these handlers but if # these exist, other loggers or not yet added handlers could as well. # Its safer to insist that this fix is applied before logging has been # configured. raise Error('logging handlers already registered.') logging._acquireLock() try: def fork_safe_createLock(self): self._orig_createLock() atfork.atfork(self.lock.acquire, self.lock.release, self.lock.release) # Fix the logging.Handler lock (a major source of deadlocks). logging.Handler._orig_createLock = logging.Handler.createLock logging.Handler.createLock = fork_safe_createLock # Fix the module level lock. atfork.atfork(logging._acquireLock, logging._releaseLock, logging._releaseLock) logging.fixed_for_atfork = True finally: logging._releaseLock()
Ensure that the lock is in the release state after the fork child process.
After Python version 3.7, the official also gives the method of standard library to realize the same functions:
os.register_at_fork(*, before=None, after_in_parent=None, after_in_child=None) Register callables to be executed when a new child process is forked using os.fork() or similar process cloning APIs. The parameters are optional and keyword-only. Each specifies a different call point. - before is a function called before forking a child process. - after_in_parent is a function called from the parent process after forking a child process. - after_in_child is a function called from the child process.