python logging log module and multi process log

Original text: https://www.jianshu.com/p/d615bf01e37b

This article mainly introduces python logging to deepen understanding. It mainly discusses how to use logging to output logs and how to safely partition log files in a multi process environment.

Original blog

1. Introduction to logging module

Python's logging module provides a flexible standard module, so that any Python program can use this third-party module to realize logging. python logging official document

The logging framework is mainly composed of four parts:

  • Loggers: interfaces that can be called directly by programs
  • Handlers: decide to assign log records to the correct destination
  • Filters: provides a more granular judgment of whether the log is output
  • Formatters: make the format layout of final record printing

2. Composition of logging

loggers

loggers is a log interface that can be called directly by the program, and log information can be written directly to the logger. The logger is not directly instantiated, but obtains objects through logging.getLogger(name). In fact, the logger object is in singleton mode, and logging is multi-threaded safe, that is, the logger object obtained is the same no matter where logging is required in the program. Unfortunately, logger does not support multiple processes. This will be explained in later chapters and some solutions will be given.
[note] the loggers object has a parent-child relationship. When there is no parent logger object, its parent object is root. When there is a parent object, the parent-child relationship will be corrected. For example, logging.getLogger("abc.xyz") will create two logger objects, one is the abc parent object and the other is the xyz child object. At the same time, abc has no parent object, so its parent object is root. But in fact, abc is a placeholder (virtual log object), and there can be no handler to handle the log. However, root is not a placeholder object. If a log object logs, its parent object will receive the log at the same time. Therefore, some users will log twice when they find that a logger object is created, because the logger they create logs once and the root object logs again.

Each logger has a log level. The following levels are defined in logging

LevelNumeric value
NOTSET 0
DEBUG 10
INFO 20
WARNING 30
ERROR 40
CRITICAL 50

When a logger receives the log information, it first determines whether it meets the level. If it decides to process it, it will pass the information to Handlers for processing.

Handlers

Handlers accurately distribute the information sent by the logger and send it to the right place. Take a chestnut and send it to the console or file or both or other places (process pipeline, etc.). It determines the behavior of each log and is the key area to be configured later.

Each handler also has a log level. A logger can have multiple handlers, that is, the logger can pass logs to different handlers according to different log levels. Of course, you can also pass the same level to multiple handlers, which can be set flexibly according to your needs.

Filters

Filters provide finer grained judgment to determine whether the log needs to be printed. In principle, if the handler obtains a log, it will be processed uniformly according to the level, but if the handler has a Filter, it can conduct additional processing and judgment on the log. For example, Filter can intercept or modify logs from a specific source, or even modify its log level (judge the level after modification).

Both logger and handler can install filters, and even multiple filters can be installed in series.

Formatters

Formatters specifies the Format layout for the final printing of a record. The Formatter will splice the passed information into a specific string. By default, Format will only print the message% (message)s directly. There are some built-in LogRecord properties in Format that can be used, as shown in the following table:

AttributeFormatDescription
asctime %(asctime)s The time of the log is constructed into a readable form. By default, it is' 2016-02-08 12:00:00123 ', accurate to milliseconds
filename %(filename)s File name containing path
funcName %(funcName)s Which function sends the log
levelname %(levelname)s Final level of log (modified by filter)
message %(message)s log information
lineno %(lineno)d Line number of the current log
pathname %(pathname)s Full path
process %(process)s Current process
thread %(thread)s Current thread

A Handler can only have one Formatter, so if you want to implement output in multiple formats, you can only use multiple handlers.

3. logging configuration

Simple configuration

First, it is explained in the loggers chapter that we have a default log object root. The advantage of this root log object is that we can directly use logging to configure and log. For example:

logging.basicConfig(level=logging.INFO,filename='logger.log')
logging.info("info message")

Therefore, the simple configuration here refers to the root log object, which can be used as you take it. Each logger is a singleton object, so it can be called anywhere in the program after configuration. We can simply configure the root log object by calling basicConfig. In fact, this method is quite effective and easy to use. It ensures that at least one Handler can handle logs when calling any logger.
The simple configuration can be set roughly as follows:

logging.basicConfig(level=logging.INFO,
         format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
         datefmt='[%Y-%m_%d %H:%M:%S]',
         filename='../log/my.log',
         filemode='a')

Code configuration

Another more detailed setting method is to configure in the code, but this setting method is the least used method. After all, no one wants to write the settings into the code. But here's a little introduction. Although it's not used much, you can use one when necessary. (to be added later)

Profile configuration

The logging configuration file in python is based on the function of ConfigParser. In other words, the format of the configuration file is also written in this way. I'll give you a more general configuration file before going into detail

##############################################
[loggers]
keys=root, log02

[logger_root]
level=INFO
handlers=handler01


[logger_log02]
level=DEBUG
handler=handler02
qualname=log02
##############################################
[handlers]
keys=handler01,handler02

[handler_handler01]
class=FileHandler
level=INFO
formatter=form01
args=('../log/cv_parser_gm_server.log',"a")

[handler_handler02]
class=StreamHandler
level=NOTSET
formatter=form01
args=(sys.stdout,)
##############################################
[formatters]
keys=form01,form02

[formatter_form01]
format=%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(process)d %(message)s
datefmt=[%Y-%m-%d %H:%M:%S]

[formatter_form02]
format=(message)s
##############################################

I believe that after reading it again, I found out the rules. I used # to separate several large pieces. Each logger or handler or formatter has a key name. Take the logger as an example. First, you need to add the key name in the [loggers] configuration to represent the logger. Then use [loggers_xxxx] where XXXX is the key name to configure this logger. In log02, I configured level and a handler name. Of course, you can configure multiple handlers. According to the handler name, go to [handlers] to find the specific handler configuration, and so on.
Then, in the code, load the configuration file as follows:

logging.config.fileConfig(log_conf_file)

There is a class configuration in the handler, which may not be well understood by some readers. In fact, this is some handler classes originally written in logging, which you can call directly here. The class pointed to by class is equivalent to the executor of the handler for specific processing. You can know from the logging document that all handler classes here are thread safe, so you can use them safely. Then the question arises. What if there are multiple processes. In the next chapter, I mainly rewrite the handler class to use logging in a multi process environment. We can rewrite or create a new handler class ourselves, and then point the class configuration to our handler class to load our own rewritten handler.

4. logging encounters multiple processes (important)

This part is actually my original intention of writing this article. In python, for some historical reason, the performance of multithreading can be ignored. Therefore, in general, when python wants to realize parallel operation or parallel computing, it uses multiple processes. However, logging in python does not support multiple processes, so it will encounter a lot of trouble.
This time, take the problem of TimedRotatingFileHandler as an example. The original function of this Handler is to cut log files by day. (today's file is xxxx.log, yesterday's file is xxxx.log.2016-06-01). The advantage of this is that, firstly, you can find the log by day, secondly, you can make the log file not very large, and the expired log can also be deleted by day.
But the problem is that if multiple processes are used to output logs, only one process will switch, and other processes will continue to play in the original file. It is also possible that when some processes switch, other processes have already entered something in the new log file, so they will ruthlessly delete it and create a new log file. Anyway, it will be very messy. I can't play happily at all.
So here are some ways to solve the multi process logging problem

reason

Before we solve this problem, let's look at the reasons for this.
First paste the source code of TimedRotatingFileHandler. This part is the operation during switching:

    def doRollover(self):
        """
        do a rollover; in this case, a date/time stamp is appended to the filename
        when the rollover happens.  However, you want the file to be named for the
        start of the interval, not the current time.  If there is a backup count,
        then we have to get a list of matching filenames, sort them and remove
        the one with the oldest suffix.
        """
        if self.stream:
            self.stream.close()
            self.stream = None
        # get the time that this sequence started at and make it a TimeTuple
        currentTime = int(time.time())
        dstNow = time.localtime(currentTime)[-1]
        t = self.rolloverAt - self.interval
        if self.utc:
            timeTuple = time.gmtime(t)
        else:
            timeTuple = time.localtime(t)
            dstThen = timeTuple[-1]
            if dstNow != dstThen:
                if dstNow:
                    addend = 3600
                else:
                    addend = -3600
                timeTuple = time.localtime(t + addend)
        dfn = self.baseFilename + "." + time.strftime(self.suffix, timeTuple)
        if os.path.exists(dfn):
            os.remove(dfn)
        # Issue 18940: A file may not have been created if delay is True.
        if os.path.exists(self.baseFilename):
            os.rename(self.baseFilename, dfn)
        if self.backupCount > 0:
            for s in self.getFilesToDelete():
                os.remove(s)
        if not self.delay:
            self.stream = self._open()
        newRolloverAt = self.computeRollover(currentTime)
        while newRolloverAt <= currentTime:
            newRolloverAt = newRolloverAt + self.interval
        #If DST changes and midnight or weekly rollover, adjust for this.
        if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc:
            dstAtRollover = time.localtime(newRolloverAt)[-1]
            if dstNow != dstAtRollover:
                if not dstNow:  # DST kicks in before next rollover, so we need to deduct an hour
                    addend = -3600
                else:           # DST bows out before next rollover, so we need to add an hour
                    addend = 3600
                newRolloverAt += addend
        self.rolloverAt = newRolloverAt

Let's start with the line if os.path.exists(dfn). The logic here is to delete the dfn file if it exists, and then rename the baseFilename file to the dfn file. Then reopen the baseFilename file and start writing things. Then the logic here is very clear

  1. Assume that the current log file name is current.log, and the file name after segmentation is current.log.2016-06-01
  2. Judge whether current.log.2016-06-01 exists. If it exists, delete it
  3. Change the current log file name to current.log.2016-06-01
  4. Reopen the new file (I observed that the default mode in the source code is "a", which was said to be "w")

Therefore, in the case of multiple processes, one process switches, the handles of other processes are still in current.log.2016-06-01, and things will continue to be written in it. Or if a process performs switching, the current.log.2016-06-01 file renamed by other processes will be deleted directly. Or there is another situation. When one process is writing and another process is already switching, unpredictable situations will occur. In another case, two processes are cutting files at the same time. The first process is executing step 3, the second process has just completed step 2, and then the first process has finished renaming but has not created a new current.log. The second process starts renaming. At this time, the second process will make an error because the current cannot be found. If the first process has successfully created current.log, the second process will save the empty file as current.log.2016-06-01. In this case, not only the log files are deleted, but also the process 1 thinks that it has completed the segmentation and will not cut again. In fact, its handle points to current.log.2016-06-01.
Well, it seems very complicated here. In fact, it is because there are no constraints on multiple processes during file operation.
So how to solve this problem gracefully. I have put forward two schemes. Of course, I will put forward more feasible schemes for you to try.

Solution 1

Previously, we found a flaw in the logic in TimedRotatingFileHandler. We only need to modify the logic slightly:

  1. Judge whether the segmented file current.log.2016-06-01 exists. If it does not exist, rename it. (if it exists, it means that other processes have cut it. I don't need to cut it. Just change the handle.)
  2. Open current.log in "a" mode
    It's as simple as finding the modification~
    talking is cheap show me the code:
class SafeRotatingFileHandler(TimedRotatingFileHandler):
    def __init__(self, filename, when='h', interval=1, backupCount=0, encoding=None, delay=False, utc=False):
        TimedRotatingFileHandler.__init__(self, filename, when, interval, backupCount, encoding, delay, utc)
    """
    Override doRollover
    lines commanded by "##" is changed by cc
    """
    def doRollover(self):
        """
        do a rollover; in this case, a date/time stamp is appended to the filename
        when the rollover happens.  However, you want the file to be named for the
        start of the interval, not the current time.  If there is a backup count,
        then we have to get a list of matching filenames, sort them and remove
        the one with the oldest suffix.
        
        Override,   1. if dfn not exist then do rename
                    2. _open with "a" model
        """
        if self.stream:
            self.stream.close()
            self.stream = None
        # get the time that this sequence started at and make it a TimeTuple
        currentTime = int(time.time())
        dstNow = time.localtime(currentTime)[-1]
        t = self.rolloverAt - self.interval
        if self.utc:
            timeTuple = time.gmtime(t)
        else:
            timeTuple = time.localtime(t)
            dstThen = timeTuple[-1]
            if dstNow != dstThen:
                if dstNow:
                    addend = 3600
                else:
                    addend = -3600
                timeTuple = time.localtime(t + addend)
        dfn = self.baseFilename + "." + time.strftime(self.suffix, timeTuple)
##        if os.path.exists(dfn):
##            os.remove(dfn)
            
        # Issue 18940: A file may not have been created if delay is True.
##        if os.path.exists(self.baseFilename):
        if not os.path.exists(dfn) and os.path.exists(self.baseFilename):
            os.rename(self.baseFilename, dfn)
        if self.backupCount > 0:
            for s in self.getFilesToDelete():
                os.remove(s)
        if not self.delay:
            self.mode = "a"
            self.stream = self._open()
        newRolloverAt = self.computeRollover(currentTime)
        while newRolloverAt <= currentTime:
            newRolloverAt = newRolloverAt + self.interval
        #If DST changes and midnight or weekly rollover, adjust for this.
        if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc:
            dstAtRollover = time.localtime(newRolloverAt)[-1]
            if dstNow != dstAtRollover:
                if not dstNow:  # DST kicks in before next rollover, so we need to deduct an hour
                    addend = -3600
                else:           # DST bows out before next rollover, so we need to add an hour
                    addend = 3600
                newRolloverAt += addend
        self.rolloverAt = newRolloverAt

Don't think the code is so long. In fact, the modified part is just the "##" comment, and the others are copied from the source code. This class inherits TimedRotatingFileHandler and overrides the segmentation process. This solution is very elegant, with very few changes and very effective. However, some netizens suggested that there is still one thing that is not perfect here, that is, the step of rename. If it is such a coincidence that two or more processes enter the if statement at the same time and start rename successively, the log will still be deleted. Indeed, this situation does happen. Because the file is segmented once a day, it is quite a coincidence that two handlers are operating at the same time and come here at the same time. However, in order to be perfect, you can add a file lock, add a lock after if, judge again after obtaining the lock, and then rename. The code is not pasted. It involves the lock code and affects the appearance.

Solution 2

I think the simplest and most effective solution. Override the FileHandler class (this class is inherited by all handlers writing files, and TimedRotatingFileHandler is the inherited class; we can add some simple judgments and operations.
Our logic is as follows:

  1. Judge whether the current timestamp is the same time as the file name pointed to
  2. If not, you can switch to the file you are pointing to
    The end is not a very simple logic.
    talking is cheap show me the code:
class SafeFileHandler(FileHandler):
    def __init__(self, filename, mode, encoding=None, delay=0):
        """
        Use the specified filename for streamed logging
        """
        if codecs is None:
            encoding = None
        FileHandler.__init__(self, filename, mode, encoding, delay)
        self.mode = mode
        self.encoding = encoding
        self.suffix = "%Y-%m-%d"
        self.suffix_time = ""

    def emit(self, record):
        """
        Emit a record.

        Always check time 
        """
        try:
            if self.check_baseFilename(record):
                self.build_baseFilename()
            FileHandler.emit(self, record)
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            self.handleError(record)
            
    def check_baseFilename(self, record):
        """
        Determine if builder should occur.
        
        record is not used, as we are just comparing times, 
        but it is needed so the method signatures are the same
        """
        timeTuple = time.localtime()
        
        if self.suffix_time != time.strftime(self.suffix, timeTuple) or not os.path.exists(self.baseFilename+'.'+self.suffix_time):
            return 1
        else:
            return 0
    def build_baseFilename(self):
        """
        do builder; in this case, 
        old time stamp is removed from filename and
        a new time stamp is append to the filename
        """
        if self.stream:
            self.stream.close()
            self.stream = None
        
        # remove old suffix
        if self.suffix_time != "":
            index = self.baseFilename.find("."+self.suffix_time)
            if index == -1:
                index = self.baseFilename.rfind(".")
            self.baseFilename = self.baseFilename[:index]
        
        # add new suffix
        currentTimeTuple = time.localtime()
        self.suffix_time = time.strftime(self.suffix, currentTimeTuple)
        self.baseFilename  = self.baseFilename + "." + self.suffix_time
        
        self.mode = 'a'
        if not self.delay:
            self.stream = self._open()

check_baseFilename is to execute logic 1 judgment; build_baseFilename is to execute logic 2 to change the handle. It's so simple.
The difference between this scheme and the previous one is that the current file is current.log.2016-06-01. By tomorrow, the current file is current.log.2016-06-02. It is neither renamed nor deleted. It is very concise and elegant. It can also solve the problem of multi process logging.

Other solutions

Of course, there are other solutions. For example, a logging process logs uniformly, and other processes enter all log contents into the logging process pipeline to be managed by it. The same is true for logging into network socket s.

 

Tags: Python multiple processes

Posted on Tue, 30 Nov 2021 05:35:10 -0500 by timski72