Instrumenting the application code with proper logging [ad] is crucial for the maintainability of any modern application. Quite often, though, developers are confused about how to use the log levels to provide useful information at the proper level. There are many resources that you can find on the Internet that give the general guidelines but fail to include detail. Hence, in this post, I would like to go deeper into the proper use of log levels and illustrate the guidelines with some examples.
Let’s first list some generic principles about logging!
- Application logs should be used to troubleshoot issues after the code execution is complete. The meaning of this is that you cannot stop the code execution to look at the variables and statements as you would do with a debugger, for example. That is especially important when code is executed on a server or in the cloud [ad], and you don’t have direct access to it.
- Application logs should be detailed enough to reconstruct the code execution path. What that means is, that using the logs, somebody (not necessary the developer who wrote the code) should be able to understand what steps were executed in the application and what happened (whether it succeeded, failed, or something else).
- Application logs should provide enough context for each step of the code execution. Telling somebody what happened is not enough – this is just stating the fact. Giving additional information as to why it happened is important to know when troubleshooting or fixing an issue.
- Exclude any personal information or secrets from the logs. This one is not related to this post, and I will cover it in a separate one, but I want to emphasize the need to sanitize the logs for privacy and security reasons.
OK, what are the standard log levels? Depending on the language and the framework, those can vary, but if you use a mature programming language with built-in or established logging [ad] framework, you can rely on having log levels like
TRACE. Here are (yet again) the guidelines for each level.
Log Level: CRITICAL or FATAL
FATALare errors from which the application cannot recover. That means that the application cannot continue its execution, and it needs to exit because of the error, and not because it completed its task. Such errors are called unrecoverable errors. Normally,
CRITICALerrors occur when the application cannot acquire or access the resources it needs for its normal execution. Examples of
- Windows blue-screen (we are all familiar with that). Reasons for Windows blue-screen can be corrupted memory or files on the disk. Because Windows needs those resources and they are either not available or corrupted, Windows cannot continue its execution.
- e-Commerce or web application cannot connect to the database. Such applications require a back-end database to store information like user details or orders, and if they cannot access that database, they cannot function normally.
- A mobile application that cannot connect to a cloud [ad] service or API needed for its initialization. Many mobile applications have a bootstrap code that pulls data from various APIs to initialize the user session. A good example is a sign-in service that verifies the user credentials at the start. If there is no connectivity or the APIs are failing, the application cannot initialize itself and cannot function properly.
It is always good to show a message to the user before the application exits instead of just shutting it down. Also, it is important to write a log message with
CRITICALlog level and save the context in which this happened. For the above examples, you can do the following:
- Windows blue-screen –> Log the memory block or the file that is corrupted
- Database connection –> Log the database connection (without the username and password)
- Mobile application –> Log the API endpoint that is failing
Log Level: ERROR
ERRORcan be any other error that can occur during the application execution that doesn’t require the application to exit. Such errors are called recoverable errors. Normally,
ERRORis a condition that impacts the normal execution of the application and is noticeable by the user, but the user can continue to use the application. Here are a few examples:
- Trying to open an Excel file that is deleted. Well, the application has already started, and it is now asking you what file to open. Missing a user file is not a
CRITICALcondition for Excel. The user just sees an error that the file is missing and can choose to open another file. Excel can continue to function properly with the new file.
- The famous Error 404 for web applications. If an HTTP link is broken or if the user types a wrong URL, the web [ad] server returns an Error 404 – Not found, and the browser shows the page. Neither the web server nor the browser needs to stop working because a page is missing. Also, the user can select another link or type the correct URL, and she or he can see that page.
- A mobile application cannot retrieve the user picture from the Gravatar service. Gravatar is a service that many applications (not only mobile) use for fetching profile pictures. If that service is down, the mobile application will not be able to show the user’s profile picture but will continue to function. The application can show a default picture and notify the user that it failed to fetch the profile picture, but the latter is not necessary. By seeing the default picture, the user can assume that something failed.
Once again, with
CRITICAL, the user notices that something is not right, but unlike
CRITICALshe or he can continue to use the application. In all the cases, though, you need to log the
ERRORlog level for troubleshooting purposes. Here are the things that you may want to log in the above cases:
- Missing Excel file –> log the file name the user is trying to open
- Error 404 –> log the URL of the page the user is trying to open
- Missing profile picture –> log the identifier or the URL of the picture
Log Level: WARNING
WARNINGis quite a confusing log level, especially for new developers.
WARNINGis something that happens before an error or another important event occurs in your application. The purpose of the
WARNINGis to warn you or somebody else that something more important will happen at a later time. While developing your application, you should think of possible events that can predict if something will happen and cause unexpected behavior of your application. You should log those “prediction events” as warnings. Here some examples:
- When you use your laptop for a while, and the hard disk starts filling up, you may get a warning that your free disk space is at a critically low level. This is not an error but can result in errors if you don’t have enough free space for the operating system to function properly. If you don’t clean up space on your hard drive, the OS may start failing and even crashing.
- If your web [ad] application connects to a database and the performance of the database degrades over time, you may have a threshold set for the latency to read data from the database. If your application detects that the latency is higher than the threshold, you may raise a
- We have all seen the Low Battery warnings on our smartphones. Of course, we always continue to use them till we deplete the battery, but we will for sure be warned that soon our phone will go out of battery and it will stop working.
Of course, the question is, why should you log those and what to write to the log? It is quite obvious – you should log the
WARNINGmessages because you need to be able to troubleshoot what happened after that. Also, a lot of alerting and notification systems are triggered by the log messages, and you can use the
WARNINGas a trigger for a different type of notification. And what should you write to the log? Here are things that you can write based on our examples above:
- Log disk space –> log the time the threshold was reached. If there are multiple thresholds, you should log each one
- Slow connection to the database –> log the time the slow connection was detected and how much the latency was at that time. You can use the consecutive
WARNINGmessages and metrics logged to determine if this was a one-time condition or if there is a progressive degradation of the latency
- Low battery lever –> this is similar to the disk space; log the time when the condition was detected and what was the battery level
Log Level: INFO
INFOlog level is used for… well, information purposes. Every application executes certain steps, and some of those steps can be considered more important than others. You can think of those as milestones in your application. Two important milestones are when your application starts and stops. You may want to wite an
INFOlog message every time the application starts and stops to persist this in a record. Let’s look at a few examples:
- When you start your laptop, it will need to run some bootstrap code first, then initialize the operating system, and then you can use it to start working. Initialization is a very important process, and you may want to capture the time it started and the time it ended. If something happened in between, you would know that the problem is in the initialization and not somewhere else. The start and the completion of the initialization process can be logged with
- Similar to the laptop scenario above, web applications also have an initialization phase. Let’s say one step of this process is to establish a connection to the database. If the connection is successfully established, you may want to log this event with an
- If you are implementing a Discord or Slack-like messaging application for mobile phones, you may allow users to connect to different servers. When a connection to a particular chat server is established, you may want to log it with an
What should you write in the log for those events?
- Laptop initialization –> write the start of the initialization process and the completion of the initialization process. Also, you may want to log the initialization of each OS component
- Database connection –> write the time the connection was established but also to name of the server and the database you connected to. Make sure that you don’t log any credentials though
- Messaging app –> write the time the connection was established and the name of the server the app connected to
One note here! In normal execution conditions, this is the highest (or lowest, depending on the logging [ad] framework and language:)) log level configured for an application. This means that you will see messages with log levels
CRITICALin your log files.
Log Level: DEBUG or TRACE
All log levels listed so far can be used during the normal execution of your application. However, normal is a temporary condition for an application. Sooner or later, you will need to troubleshoot issues with your application. If it was running on your local machine, it would be relatively easy because you can attach a debugger and step through each line of code. But if it is running on a remote machine like a server or somebody else’s machine, that won’t be so easy.
TRACElog levels are intended to help you debug your application if you don’t have a debugger attached (hence the name
Some logging frameworks have both
TRACEbut some don’t, and you have to simulate one with the other or just standardize in using only one. If you want to use both log levels, here are my suggestions on how to use them:
How to Use the TRACE Log Level?
Instrument every block of code with a
TRACEmessage at the beginning and the end of the block. For example, blocks of code can be:
- A function or a method
- A loop
- A condition like
TRACEmessages, you will be able to uniquely identify what steps are executed (hence
TRACE) and where in the program execution, an error eventually happened.
Normally, when I write
TRACEmessages, those do not have dynamic data, and their sole purpose is to inform me where in the program execution I am.
How to Use DEBUG Log Level?
Instrument every place of your code where a change in the data happens with a
DEBUGmessage. For example, such places can be:
- Beginning and the end of a function or a method. You will want to know what data was passed to the function and what data was returned from it
- Inside a block of code where data is modified. You will want to know what the state of the data was before the modification and after the modification
DEBUGmessages, you will be able to follow the modification of data throughout the execution of your program.
Normally, when I write
DEBUGmessages, those contain clarifying text as well as dynamic data like the variables that are modified.
As mentioned above, not all languages and logging [ad] framework have both
TRACElog levels but you can use one to simulate the other.
How About ALL and OFF Log Levels?
Well, there are some languages and frameworks that allow you to completely turn off logging by using the
OFFlog level. I certainly have not seen a good reason to do that and will not recommend it because you will lose any information if an error happens. However, there may be a valid reason that I am not aware of.
On the opposite side, log level
ALLallows you to log everything including any custom levels that you have created (and, yes, you are able to create your own log levels in almost every language or logging framework). However, custom log levels are rarely used and so far I have not seen a reason to deviate from the above five or to need more.
Here is a table that summarizes what log levels are supported in popular development languages:
|Log Level||C# (ASP .NET Core)||Java (Log4J)||Python||Rust||Go (go-logging)||Ruby on Rails||PHP (Log4PHP)|
Here are some language and framework-specific links that you may find helpful:
- Logging in .NET Core
- Apache Log4J Introduction
- Python logging module
- Rust log facade
- go-logging GitHub Project
- Rails Logger and Rails Logging Best Practices
- Apache Log4PHP Quick Start
In the next post from the series, we will look at a Python example and explain how different log levels can be used.