About Logging

This blog post will be about the DOs and DON'Ts of code logging I picked up along the way in my coding career.

I don't have to emphasize the importance of logging, if you think that logging is not important at all, you should stop reading now. 😀

First off, the DOs and DON'Ts

This is a quick list, a "cheat sheet" if you will, which I will expand on down below.

DO log your program output to stderr. This is meant for both errors and diagnostics, logs are classified as "diagnostic messages of a program".
DO ensure the only place you're logging your program output to is stderr, not a file, a DB, or a third-party service.
DO adhere to structured logs. JSON is the go-to format for a majority, but if you're working with simple logs, logfmt works fine too.
DO play around with log levels. The default level should be set to INFO in production and DEBUG in development.
DO NOT log sensitive details such as PII data - stick to using IDs (e.g. user ID).
DO NOT log and handle an error simultaneously.

The Deep Dive

Logging to stderr

No matter the log level, it should be streamed to stderr. From the docs on stderr it reads like this:

Under normal circumstances every UNIX program has three streams opened for it when it starts up, one for input, one for output, and one for printing diagnostic or error messages.

Logs are diagnostics and are not necessarily meant to be consumed by the end user of the program.

Keep in mind that many production systems do not adhere to this original POSIX standard though, and it's fine as long as the output stream is consistent.

Log to stderr and nowhere else

It's not the job of your program to log to a file, to a database, or to a 3rd party service. You will be making your program more complex if you do.

Of course, there are exceptions, when for example logging to the standard streams is not possible.

In all other cases, let other tools handle that for you, such as a log scraper that will send the logs to a Grafana, Loki, Datadog, etc.

Adhere to structured logging

Structured logs are way easier to process by other machines than by humans, and this is the point exactly.

You don't need to drill down into every log line separately, but to have an aggregate overview of the logs that your program is emitting based on which you can make meaningful decisions.

Use JSON because it is widely used. I haven't encountered a log management service that doesn't support JSON, but if you like other formats, like XML or logfmt, then use those as long as you are consistent. Pick one format and adhere to it.

Log levels

This is a tricky one. Every developer has an opinion on which log levels should be used.

I personally found that most programs do not need more than three levels which are INFO, ERROR, and DEBUG. Not even DEBUG is necessary in most cases.

So let me explain these levels:

DEBUG: This should be default in development, this level is only enabled in production as a last resort, if everything else fails, which is also a good opportunity to improve on log messages. Usually, metrics and development events are logged on this level.
INFO: Represents normal operation with sufficient metadata like, "user_id added record into DB", but not as exhaustive as DEBUG. Anything that helps you pinpoint the execution flow of your program.
ERROR: Usually, indicates an error that cannot be recovered from, e.g. "cannot insert data into the db", and with enough metadata that you can reconstruct the error when debugging.

You don't need these log levels:

WARNING: these are basically INFO logs with a label attached to it. They are definitely not ERROR logs, but it's something that you should keep an eye on, e.g. "database query took 3 seconds", above the threshold considered normal. It could signal a lot of things out of the ordinary.
NOTICE, FATAL, ALERT, CRITICAL, etc.: are all just semantics of either INFO or ERROR where a label would suffice, e.g. "INFO alert message alert=true".

Do not log sensitive data

I can't state the obvious. Not one system is 100% hackproof.

Logging sensitive data such as names and OMG, passwords is creating yet another attack vector.

The best way to filter out sensitive data is to have an ETL (Extract-Transform-Load) function execute on the log messages before it is written to the destination stream.

Do not handle errors and log them too

Logging an error means you handled it. You either handle the error or log it if it's an unrecoverable error.

For example, if a database insert fails, the code could retry the query like two more times, that's handling it. If the insert fails every time, the code should log that so somebody can inspect it later on.

If you handle the error (e.g. retry the insert query) and log it too, that log line will be meaningless to whoever is analyzing the logs, because ultimately it wasn't an error that the program couldn't recover from.

The art of writing good logs

Answering a few questions each time you wish to write a log line in your code is a good way to think about whether it's worth writing that log or not.

Can the event be reproduced from the logs? If the event, be it a warning or an error, cannot be reproduced just from the logs it's not worth logging. For example, "cannot update table". Which table? Where in the code? What data was used?
Are the logs rich with enough metadata to understand what happened? This is why structured logging is so important, that you can attach metadata to each log. Example of a bad log: "fetching data from URL failed", but doesn't include which URL, nor what request variables were sent.
Can multiple log lines be correlated? This is most prevalent in Web API services where multiple requests could be happening at the same time, making the log lines intertwined with each other. It is a good practice to attach a so-called "request_id" to individual logs so that you can track down and correlate these logs later on.
Is the error clear just by reading the error logs? The example above, "cannot update table", while it is a simple message, it's not clear. It does not help in any way, it can even be considered noise. A better message would be "cannot update table user_photos: user does not have access to group" with metadata such as group_id: 123 and user_id: 456.
Is it clear from the log when and where the event occurred? A program execution can take many code paths that could result in the same error. For example, multiple functions can update the same database table. Do you know which code section resulted in an error and do you know when the error occurred? Save a timestamp and the "file:line" location with every log, like: "cannot update table user_photos" metadata t=1697880685 and user_group_photos.php:38 or separate it like filename=user_group_photos.php, line=38.

Individual log lines vs wide events

There has always been a debate on whether log lines or wide events are a better choice.

Every function follows an execution path. The question is do you emit multiple logs along this path or one big one at the end of the execution?

In my experience, generally, individual log lines find their utility in CLI applications, where the execution path is small and clear enough, while web projects or request/response-based applications prefer wide events, enriched with metadata added throughout the execution path resulting in a single log line emitted at the end of the execution path.

Conclusion

There you have it. My thoughts on the "art and science" of logging.

It's not just about jotting down what your program is up to but analyzing it so you can make clear decisions about what to improve in your code.

After all, what is a log good for if nobody reads it?

Photo credits: Niklas Jeromin

Primal Skill Programming