Logging

Logging is an important thing to get right. It is a window into what's happening in your application, right now and [x] days ago. Keeping a close eye on the error logs allows you to find bugs that users haven't reported, or haven't noticed. Doing logging right involves a few different strategies in conjunction with each other.

UTC All The Things (TM)

Your server and application are both running UTC, right? Otherwise, you're gonna have a hard time (or a harder time, depending on your timezone) marrying up log messages from syslog, your webserver and your application. Sometimes you need to locate the HTTP request in the Apache access logs that resulted in a particular application-side error log message. It's easy to forget during BST that the time you're grepping the logs for is offset by an hour! When you're investigating an isssue on production the last thing yo want to do is increase your cognitive load by remembering which set of logs you need to subtract an hour from. Worse if you're in a +3h45m timezone.

Detail

The level of detail to include in the application logs is important to get right. It's a balancing act between enough detail to do the three necessaries:

  1. What's the error, including an exact exception message or error code if provided

  2. Where in the code did it originate - a stack trace might be nice here

  3. What parameters were passed to the method that failed - was there a particular record from the database causing a loop to crash? You need to be able to replicate the exact error conditions.

RFC 5424 is handy here. Specifically its table of log message severities:




           Numerical         Severity
             Code

              0       Emergency: system is unusable
              1       Alert: action must be taken immediately
              2       Critical: critical conditions
              3       Error: error conditions
              4       Warning: warning conditions
              5       Notice: normal but significant condition
              6       Informational: informational messages
              7       Debug: debug-level messages

Hopefully the logging library you're using for your language of choice supports writing error messages with severities from this list. Done right, this allows you to crank up or crank down the level of verbosity on the fly, in production. Without changing the code itself.

Logrotate

You will need to be grepping, catting and tailing logs. Messages from months ago are probably not relevant. Decide how many days' logs you want to retain and use logrotate to get rid of the older stuff. zgrep and zcat come in handy if you're using gzip for older logs.

Signal to noise

The strategies outlined above are about reducing the signal to noise ratio in the logs. The balancing act is about making sure there's enough detail when you need it, but not so much information flying past your face when you're tail -f ing that you can't see what you're looking for when it happens.