[note] this article is translated from: Saving Time with Structured Logging - Reflectoring
Logging is the ultimate resource for investigating events and understanding what is happening in your application. Each application has some type of log.
However, these logs are often confusing and require a lot of effort to analyze them. In this article, we will study how to use structured logs to greatly increase the value of logs.
We will use some very practical techniques to improve the value of application log data, and use Logz.io as the log platform to query logs.
code example
Attached to this article On GitHub Working code example for.
What is structured logging?
The "normal" log is unstructured. They usually contain a message string:
2021-08-08 18:04:14.721 INFO 12402 --- [ main] i.r.s.StructuredLoggingApplication : Started StructuredLoggingApplication in 0.395 seconds (JVM running for 0.552)
This message contains all the information we want when investigating an event or analyzing a problem:
- Date of the log event
- The name of the logger that created the log event, and
- The log message itself.
All the information is in the log message, but it is difficult to query this information! Since all information is in a string, if we want to get specific information from the log, we must parse and search the string.
For example, if we only want to view the logs of a specific logger, the log server must parse all log messages, check whether they have a specific pattern that identifies the logger, and then filter the log messages according to the desired logger.
Structured logs contain the same information, but in structured form rather than unstructured strings. Typically, structured logs are presented in JSON format:
{ "timestamp": "2021-08-08 18:04:14.721", "level": "INFO", "logger": "io.reflectoring....StructuredLoggingApplication", "thread": "main", "message": "Started StructuredLoggingApplication ..." }
This JSON structure allows the log server to effectively store and, more importantly, retrieve logs.
For example, you can now easily filter logs through timestamp or logger, and search is more effective than parsing strings of a specific pattern.
But the value of structured log is more than that: we can add any custom fields to structured log events as needed! We can add context information to help us identify problems, or we can add indicators to the log.
With all the data we now have at our fingertips, we can create powerful log queries and dashboards, and we can find the information we need even if we just wake up in the middle of the night to investigate events.
Now let's look at a few use cases that demonstrate the power of structured logging.
Add code paths for all log events
The first thing we need to look at is the code path. Each application usually has several different paths, and incoming requests can pass through the application. Consider this diagram:
Using structured logs in Java Spring Boot projects saves time
This example has (at least) three different code paths that incoming requests can take:
- User code path: users are using the application from their browser. The browser sends a request to the Web controller, and the controller calls the domain code.
- Third party system code path: the HTTP API of the application is also called from the third party system. In this example, the third-party system calls the same web controller as the user's browser.
- Timer code path: like many applications, this application has some scheduled tasks triggered by timers.
Each of these code paths can have different characteristics. Domain services involve all three code paths. During events involving domain service errors, it is helpful to understand the code path that causes the error!
If we don't know the code path, it's easy to make fruitless guesses during the event investigation.
Therefore, we should add the code path to the log! Here's how we do this with Spring Boot.
Add a code path for the incoming Web request
In Java, the SLF4J log library provides MDC classes (message diagnosis context). This class allows us to add custom fields to all log events issued in the same thread.
To add custom fields for each incoming Web request, we need to build an interceptor and add the codePath field at the beginning of each request, even before our Web controller code executes.
We can do this by implementing the HandlerInterceptor interface:
public class LoggingInterceptor implements HandlerInterceptor { @Override public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception { if (request.getHeader("X-CUSTOM-HEADER") != null) { MDC.put("codePath", "3rdParty"); } else { MDC.put("codePath", "user"); } return true; } @Override public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView) { MDC.remove("codePath"); } }
In the preHandle() method, we call MDC.put() to add the codePath field to all log events. If the request contains a header identifying that the request is from a third-party system, we set the code path to 3rdParty. Otherwise, we assume that the request is from the user's browser.
According to different applications, the logic here may be very different. Of course, this is only an example.
In the postHandle() method, we should not forget to call MDC.remove() to delete all previously set fields again, otherwise the thread will still retain these fields, even if it returns to the thread pool, and those fields that will be serviced by the thread in the next request may still be set to the wrong value.
To activate the interceptor, we need to add it to the interceptor registry:
@Componentpublic class WebConfigurer implements WebMvcConfigurer { @Override public void addInterceptors(InterceptorRegistry registry) { registry.addInterceptor(new LoggingInterceptor()); } }
this is it. All log events emitted in the thread that passed in the log event now have a codePath field.
If any request creates and starts a child thread, ensure that MDC.put() is called at the beginning of the new thread life cycle.
Add a code path to a scheduled job
In Spring Boot, we can easily create Scheduled jobs by using the @ Scheduled and @ enableshcheduling annotations.
To add the code path to the log, we need to ensure that MDC.put() is called as the first thing in the scheduling method:
@Componentpublic class Timer { private final DomainService domainService; private static final Logger logger = LoggerFactory.getLogger(Timer.class); public Timer(DomainService domainService) { this.domainService = domainService; } @Scheduled(fixedDelay = 5000) void scheduledHello() { MDC.put("codePath", "timer"); logger.info("log event from timer"); // do some actual work MDC.remove("codePath"); } }
In this way, all log events emitted from the thread executing the scheduling method will contain the field codePath. We can also create our own @ Job annotations or similar annotations to do this for us, but this is beyond the scope of this article.
To make the log of scheduled jobs more valuable, we can add other fields:
- job_status: indicates whether the job was successful.
- job_id: ID of the executed job.
- job_records_processed: if the job performs some batch processing, it can record the number of records processed.
- ......
Through these fields in the log, we can get a lot of useful information from the log server!
Add user ID to user initiated log events
Most of the work in a typical Web application is done in Web requests from the user's browser, which trigger threads in the application to create responses for the browser.
Imagine that some errors have occurred, and the stack trace in the log shows that it is related to a specific user configuration. But we don't know which user the request comes from!
To alleviate this situation, it is very helpful to include a user ID in all log events triggered by the user.
Since we know that most incoming Web requests come directly from the user's browser, we can add the username field to the same LoggingInterceptor created to add the codePath field:
public class LoggingInterceptor implements HandlerInterceptor { @Override public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception { Object principal = SecurityContextHolder.getContext().getAuthentication().getPrincipal(); if (principal instanceof UserDetails) { String username = ((UserDetails) principal).getUsername(); MDC.put("username", username); } else { String username = principal.toString(); MDC.put("username", username); } return true; } @Override public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView) { MDC.remove("username"); } }
This code assumes that we use Spring Security to manage access to Web applications. We use SecurityContextHolder to get the Principal and extract the user name from it to pass it to MDC.put().
Each log event issued from the thread of the service request will now contain the username field and username.
With this field, we can now filter the logs of specific user requests. If users report problems, we can filter the logs based on their names and greatly reduce the logs we have to view.
According to regulations, you may want to record a more opaque user ID rather than a user name.
Add root cause for error log events
When an error occurs in our application, we usually record a stack trace. Stack traces help us determine the root cause of the error. If there is no stack trace, we will not know which code caused the error!
However, if we want to run error statistics in an application, stack tracing is very clumsy. Suppose we want to know how many errors our application records every day, and how many of them are caused by which root cause exception. We must export all stack traces from the log and filter them manually to get the answer to this question!
However, if we add the custom field rootCause to each error log event, we can filter log events through this field, and then create histograms or pie charts of different root causes in the UI of the log server, even without exporting data.
One way to do this in Spring Boot is to create a @ ExceptionHandle:
@ControllerAdvicepublic class WebExceptionHandler { private static final Logger logger = LoggerFactory.getLogger(WebExceptionHandler.class); @ExceptionHandler(Exception.class) @ResponseStatus(HttpStatus.INTERNAL_SERVER_ERROR) public void internalServerError(Exception e) { MDC.put("rootCause", getRootCause(e).getClass().getName()); logger.error("returning 500 (internal server error).", e); MDC.remove("rootCause"); } private Throwable getRootCause(Exception e) { Throwable rootCause = e; while (e.getCause() != null && rootCause.getCause() != rootCause) { rootCause = e.getCause(); } return rootCause; } }
We created a class annotated with @ ControllerAdvice, which means that it is valid in all our web controllers.
In the class, we created a method annotated with @ ExceptionHandler. This method is called for exceptions that occur in any Web controller. It sets the rootCause MDC field to the fully qualified name of the exception class that caused the error, and then records the stack trace of the exception.
this is it. All log events that print stack traces now have a field rootCause, which we can filter to understand the error distribution in our application.
Add trace ID s to all log events
If we run multiple services, such as in a microservice environment, things will quickly become complicated when analyzing errors. One service calls another service, another service calls another service, and it is difficult (if possible) to track errors in one service to errors in another service.
Trace ID S help connect log events in one service to log events in another service:
In the example diagram above, service 1 is called and generates a trace ID "1234". It then calls services 2 and 3 and propagates the same trace ID to them so that they can add the same trace ID to their log events, so that they can connect the log events of all services by searching for a specific trace ID.
Service 1 also creates a unique span ID for each outgoing request. While tracking the entire request / response cycle across service 1, the span spans only the request / response cycle between one service and another.
We can implement such a tracking mechanism ourselves, but there are some tracking standards and tools that can be integrated into the tracking system using these standards, such as Distributed tracking function of Logz.io.
Let's use standard tools. In the Spring Boot world, this is Spring Cloud Sleuth , we can integrate this function into our application by simply adding it to our pom.xml:
<dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>2020.0.3</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement><dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency> </dependencies>
This will automatically add trace and span ID s to our log and propagate them from one service to the next through the request header when using supported HTTP clients. You can“ Using Spring Cloud Sleuth for tracking in Distributed Systems ”Read more about Spring Cloud Sleuth in the article.
Add the duration of some code paths
The total duration required for our application to respond to requests is an important indicator. If the speed is too slow, users will feel frustrated.
In general, it is a good idea to expose the request duration as an indicator and create a dashboard showing the histogram and percentile of the request duration, so that we can have a clear understanding of the health of the application at a glance and may even receive an alert when a threshold is violated.
However, we are not always looking at the dashboard. We may be interested not only in the total request duration, but also in the duration of some code paths. When analyzing logs to investigate problems, understanding the time required to execute a particular path in your code can be an important clue.
In Java, we might do this:
void callThirdPartyService() throws InterruptedException { logger.info("log event from the domain service"); Instant start=Instant.now(); Thread.sleep(2000); // simulating an expensive operation Duration duration=Duration.between(start,Instant.now()); MDC.put("thirdPartyCallDuration",String.valueOf(duration.getNano())); logger.info("call to third-party service successful!"); MDC.remove("thirdPartyCallDuration"); }
Suppose we are calling a third-party service and want to add the duration to the log. Using Instant.now() and Duration.between(), we calculate the duration, add it to the MDC, and then create log events.
The log event will now contain the field thirdPartyCallDuration, which we can filter and search in the log. For example, we might search for instances where the call takes too long. Then, we can use user ID or trace ID. when this takes a long time, we can also use them as fields of log events to find patterns.
Query structured logs in Logz.io
If we follow the rules about per-environment logging The description in the article sets logging to Logz.io , we can now query logs in Kibana UI provided by Logz.io.
Error distribution
For example, we can query all log events with values in the rootCause field:
__exists__: "rootCause"
This displays a list of error events with root causes.
We can also create a visualization in the Logz.io UI to display the error distribution within a given time range:
This chart shows that almost half of the errors are caused by ThingyException, so it may be a good idea to check whether this exception can be avoided in some way. If it cannot be avoided, we should record it on WARN instead of ERROR to keep the ERROR log clean.
Error distribution across code paths
For example, suppose a user complains that a scheduled job is not working properly. If we add a job to the scheduling method code_ In the status field, we can filter the logs through those failed jobs:
job_status: "ERROR"
In order to get a more advanced view, we can create another pie chart visualization to display the job_ Distribution of status and rootCause:
We can now see that most of the scheduled assignments have failed! We should add some alerts for this! We can also see which exceptions are the root cause of most scheduled jobs and start investigating.
Check for user errors
Alternatively, suppose a user with the user name "user" makes a support request, specifying the approximate date and time when it occurred. We can use query username: user to filter the log to display only the user's log, and we can quickly reset the cause of the user's problem to zero.
We can also extend the query to display only the log events of the user with rootCause to directly understand when and what went wrong.
username: "user" AND _exists_: "rootCause"
Structure your blog
This article shows only a few examples of how we can add a structure to log events and use it when querying logs. Any content that can be searched in the log in the future should be a custom field in the log event. The fields added to the log events largely depend on the application we are building, so when writing code, be sure to consider what information can help you analyze the log.
You can On GitHub Find the code examples discussed in this article.