426 reads

Logging in Observability - Part 1

by Denis MatveevAugust 30th, 2022

Too Long; Didn't Read

What is logging and how does it help me in enhancing observability of the system I have? This article gives an answer how logging works, why it is nice idea to transfer log to a remote storage.

Company Mentioned

Coin Mentioned

featured image - Logging in Observability - Part 1

What is logging?

Today I want to consider one important component of observability. If monitoring is a pretty clear thing, now I want to focus on considering logging, talk about how to use logs' information, how to work with and aggregate events.

In my past article, we have already discussed the difference between observability and monitoring. You can find this article here: //gzht888.com/observability-vs-monitoring-whats-the-difference

Let us make a brief review of how Linux (and other Unix-like systems too) writes messages into files.

Logs - text information that is generated by a running program. Just imagine. You run your own program written in any programming language, you want to see what your application is doing at the moment. For this purpose you can add strings like:

printf("Hello World\n");

in C language
or

print("Hello world")

in python and so on.

Real programs have hundreds of such 'print' and output lots of information.

It's OK, when you run your program you'll see what you put as an argument to print() function; But what about daemons? They don't have stdout or stderr. All interesting information should be written into a file called a log file. Traditionally, Linux has a special system for logging: syslog

To write there, use a special syscall syslog() or syslog module in python or logger in bash.

There are special facilities, severity levels and so on that are used to differentiate messages. It's a pretty powerful system.

If necessary you can find a detailed description in man syslog. Most logs are stored under /var/log directory

/var/log as a place for logs is not a requirement, it is allowed to write logs everywhere. Many applications write logs to their own locations, but syslog is a convenient approach to writing logs, which allows to separate files and locations, including transferring logs to remote storage over a network (Honestly, it depends on implementation. All modern systems use rsyslog which has such support)

Because logs are text, there are many utilities to work with text in Linux. Grep, uniq, sed, awk, tail, head etc. You must be familiar with them.

This is very nice, we have a set of utilities and can analyze logs, search for necessary info, then create different tops of something. You should understand, it is only once and next time you need to create this again. It is annoying.

Syslog

As was said above, Linux traditionally has a logging system called syslog. Syslog is a Unix subsystem for delivering messages to files. Also, syslog is a main system journal. Depending on Linux flavor, syslog is located at

/var/log/syslog (for Debian based distros)

/var/log/messages(for Red Hat like distros)

But for full understanding, there are many other predefined files to be written:

/var/log/auth.log or /var/log/security.log - authorization related messages
/var/log/dmesg - kernel messages
/var/log/cron - for cron jobs

and others.

Let’s take a closer look at syslog, because this is the most well-known place for logging.

The system call syslog() allows developers not to think about timestamps, what file logs are written to:

syslog(LOG_LOCAL0, "%s%s%s\n", strerr, ": ", strerror(err));

By default, messages are written into syslog with a prefix of timestamp, hostname, and application name:

Aug 23 13:28:17 vds swd: Parsing config file /etc/swd/swd.cfg
Aug 23 13:28:17 vds swd: Port number = 80
Aug 23 13:28:17 vds swd: Setting rootdir = /var/www
Aug 23 13:28:17 vds swd: Listen to 0.0.0.0
Aug 23 13:28:17 vds swd: Number of workers = 2
Aug 23 13:28:17 vds swd: Started OK, My PID = 26385

Of course, though syslog has an application name, sometimes file becomes hard to read and grows too fast. To facilitate this, there are at least two options:

Redirect writing of a specific application log into its own file
Use logrotate to rotate logs and compress

The best practice is the following: for every application use redirecting to separate file and then rotate.

You can find pretty examples in

/etc/syslog/syslog.d/50-default.conf

like this:

kern.*                          -/var/log/kern.log

which means to write all messages with facility kern and with all levels into /var/log/kern.log

Levels are also different:

emerg
alert
crit
err
warning
notice
info
debug

For your application the most fit facilities are **

local0 - local7
user

Messages may came asynchronously to syslog

Anyway, there is still the option to create custom logs wherever you prefer (even in home directory)

Despite this, a recommended place for custom logs is /var/log/

Logrotate is useful to prevent eating all disk space, in case you store files locally. Nowadays, most systems transfer their logs to remote storage for many reasons. At the moment, we only notice, rsyslog (a modern implementation of syslog) can also send logs over the network to remote storage.

RSyslog

RSyslog - is an abbreviation of ‘Rocket-fast system for log processing’ This system is very advanced in log processing:

Multithreaded
Supports TCP, UDP, TLS
Possible to store logs in database like MySQL, PostgreSQL, Oracle, Elasticsearch
Filter any part of log
Customizable output format

To enhance functionality, Rsyslog has modules:

input - collect info from different sources
output - redirect messages, destination may be either local file or remote storage
parse - parse messages
modification - modify messages
string generator - generates string based on message

Moreover, rsyslog allows creating rules based on filters and actions:

:msg,contains,"[UFW " /var/log/ufw.log

which filters messages (:msg property in syslog) containing [UFW and writes such messages to a specific file /var/log/ufw.log

There are a huge number of different ways to create rulesets (or just rules). You can modify logs of your application, which writes messages to syslog (or just to a file), as you need.For full flexibility, Rsyslog has scripting, which allows creating complex rules for processing messages. Rsyslog has queues inside its architecture to improve performance in multithreaded mode and allows creating queues in config files for actions. On the one hand, such an approach can increase performance significantly, on the other hand can also call for performance degradation.

Conclusion of using Rsyslog

As shown above, rsyslog is a high performance and advanced system to work with logging. This Unix subsystem allows developers writing programs, but rely on the reliable system for logging. For administrators, modern syslog is a useful tool to configure log flow as it is preferred in distributed systems, including such popular storage like Elasticsearch for further analysis.

Journald

Most people already use journald, but don’t suspect this. Look at the command:

systemctl status nginx

will show the status of nginx web server and its tail of log. This is an example of using journals.

Almost all Linux distros havesystemd instead of systemV for many years and shipped with journald as a default tool to work with logs. journald is a part of systemd.

Features

Binary logs (forgery protection)
Does not require special set-up
Supports multi-line, multi-field logs
Indexed data
Centralized storage
Supports both local storages: disk, memory
Journald has very rich functionality to work with compressing, freeing space, forwarding messages

What types of logs does journald take?

syslog
systemd units logs
auditd logs
submitting logs via Journal API
kernel logs kmsg

Auditd

There is another important log - auditd. This system registers kernel events (configured in special files) and writes them into a log. There are many use cases for auditd. For more details I invite you to read my article at

Log shippers

We’ve considered two modern log systems in Linux, which allows transferring data to a centralized storage. These are rsyslog, and journald, they are present by default. They have both pros and cons.

There are many resources, which give information about detailed comparison rsyslogd and journald in terms of remote transferring data, their performance and so on.

But I would like to focus on a new approach to store logs remotely for further analysis. I mean log shippers - lightweight processes which take file logs as an input, process them if necessary, extract required info and/or transform to specific format, then ingest this to a remote storage. A well-known example is filebeat by Elastic. Filebeat is not the only one, there are many implementations from different developers. If you ask me regarding rsyslog vs journald vs filebeat for transferring messages of a specific application, I’ll reply as follows: “My choice is filebeat”.

In my opinion, it is easier to configure and does only one thing. Of course, it is not a versatile solution, you should find yours for your tasks.

Pros and cons of remote storing

Pros for storing files remotely:

Systems don’t spend disk space for log files
Storing logs remotely, we can conveniently analyze logs from all servers and build dashboards
In such case we prevent logs from being removed accidentally or being made fake

If logs are stored locally:

Inconvenient to analyze, especially if the system has hundreds of application instances and they are distributed
There is a risk to remove logs in case if a server is compromised
Logs create additional load on the system
If remote server is inaccessible, there is a risk to lose messages (depending on implementation)