What is logging?
Today I want to consider one important component of observability. If monitoring is a pretty clear thing, now I want to focus on considering logging, talk about how to use logs' information, how to work with and aggregate events.
In my past article, we have already discussed the difference between observability and monitoring. You can find this article here: //gzht888.com/observability-vs-monitoring-whats-the-difference
Let us make a brief review of how Linux (and other Unix-like systems too) writes messages into files.
Logs - text information that is generated by a running program. Just imagine. You run your own program written in any programming language, you want to see what your application is doing at the moment. For this purpose you can add strings like:
printf("Hello World\n");
in C language
or
print("Hello world")
in python and so on.
Real programs have hundreds of such 'print' and output lots of information.
It's OK, when you run your program you'll see what you put as an argument to print() function; But what about daemons? They don't have stdout
or stderr
. All interesting information should be written into a file called a log file. Traditionally, Linux has a special system for logging: syslog
To write there, use a special syscall syslog()
or syslog
module in python or logger
in bash.
There are special facilities, severity levels and so on that are used to differentiate messages. It's a pretty powerful system.
If necessary you can find a detailed description in man syslog
. Most logs are stored under /var/log
directory
/var/log
as a place for logs is not a requirement, it is allowed to write logs everywhere. Many applications write logs to their own locations, but syslog is a convenient approach to writing logs, which allows to separate files and locations, including transferring logs to remote storage over a network (Honestly, it depends on implementation. All modern systems use rsyslog which has such support)
Because logs are text, there are many utilities to work with text in Linux. Grep
, uniq
, sed
, awk
, tail
, head
etc. You must be familiar with them.
This is very nice, we have a set of utilities and can analyze logs, search for necessary info, then create different tops of something. You should understand, it is only once and next time you need to create this again. It is annoying.
Syslog
As was said above, Linux traditionally has a logging system called syslog. Syslog is a Unix subsystem for delivering messages to files. Also, syslog is a main system journal.
Depending on Linux flavor, syslog is located at
/var/log/syslog
(for Debian based distros)
or
/var/log/messages
(for Red Hat like distros)
But for full understanding, there are many other predefined files to be written:
/var/log/auth.log
or /var/log/security.log
- authorization related messages
/var/log/dmesg
- kernel messages
/var/log/cron
- for cron jobs
and others.
Let’s take a closer look at syslog, because this is the most well-known place for logging.
The system call syslog()
allows developers not to think about timestamps, what file logs are written to:
syslog(LOG_LOCAL0, "%s%s%s\n", strerr, ": ", strerror(err));
By default, messages are written into syslog with a prefix of timestamp, hostname, and application name:
Aug 23 13:28:17 vds swd: Parsing config file /etc/swd/swd.cfg
Aug 23 13:28:17 vds swd: Port number = 80
Aug 23 13:28:17 vds swd: Setting rootdir = /var/www
Aug 23 13:28:17 vds swd: Listen to 0.0.0.0
Aug 23 13:28:17 vds swd: Number of workers = 2
Aug 23 13:28:17 vds swd: Started OK, My PID = 26385
Of course, though syslog has an application name, sometimes file becomes hard to read and grows too fast. To facilitate this, there are at least two options:
- Redirect writing of a specific application log into its own file
- Use logrotate to rotate logs and compress
The best practice is the following: for every application use redirecting to separate file and then rotate.
You can find pretty examples in
/etc/syslog/syslog.d/50-default.conf
like this:
kern.* -/var/log/kern.log
which means to write all messages with facility kern
and with all levels into /var/log/kern.log
Levels are also different:
- emerg
- alert
- crit
- err
- warning
- notice
- info
- debug
For your application the most fit facilities are **
Messages may came asynchronously to syslog
Anyway, there is still the option to create custom logs wherever you prefer (even in home directory)
Despite this, a recommended place for custom logs is /var/log/
Logrotate
is useful to prevent eating all disk space, in case you store files locally. Nowadays, most systems transfer their logs to remote storage for many reasons. At the moment, we only notice, rsyslog (a modern implementation of syslog) can also send logs over the network to remote storage.
RSyslog
RSyslog - is an abbreviation of ‘Rocket-fast system for log processing’
This system is very advanced in log processing:
- Multithreaded
- Supports TCP, UDP, TLS
- Possible to store logs in database like MySQL, PostgreSQL, Oracle, Elasticsearch
- Filter any part of log
- Customizable output format
To enhance functionality, Rsyslog has modules:
- input - collect info from different sources
- output - redirect messages, destination may be either local file or remote storage
- parse - parse messages
- modification - modify messages
- string generator - generates string based on message
Moreover, rsyslog allows creating rules based on filters and actions:
:msg,contains,"[UFW " /var/log/ufw.log
which filters messages (:msg
property in syslog) containing [UFW
and writes such messages to a specific file /var/log/ufw.log
There are a huge number of different ways to create rulesets (or just rules). You can modify logs of your application, which writes messages to syslog (or just to a file), as you need.For full flexibility, Rsyslog has scripting, which allows creating complex rules for processing messages.
Rsyslog has queues inside its architecture to improve performance in multithreaded mode and allows creating queues in config files for actions. On the one hand, such an approach can increase performance significantly, on the other hand can also call for performance degradation.
Conclusion of using Rsyslog
As shown above, rsyslog is a high performance and advanced system to work with logging. This Unix subsystem allows developers writing programs, but rely on the reliable system for logging. For administrators, modern syslog is a useful tool to configure log flow as it is preferred in distributed systems, including such popular storage like Elasticsearch for further analysis.
Journald
Most people already use journald, but don’t suspect this. Look at the command:
systemctl status nginx
will show the status of nginx
web server and its tail of log. This is an example of using journals.
Almost all Linux distros havesystemd
instead of systemV
for many years and shipped with journald
as a default tool to work with logs. journald
is a part of systemd
.
Features
- Binary logs (forgery protection)
- Does not require special set-up
- Supports multi-line, multi-field logs
- Indexed data
- Centralized storage
- Supports both local storages: disk, memory
- Journald has very rich functionality to work with compressing, freeing space, forwarding messages
What types of logs does journald take?
- syslog
- systemd units logs
- auditd logs
- submitting logs via Journal API
- kernel logs kmsg
Auditd
There is another important log - auditd
. This system registers kernel events (configured in special files) and writes them into a log. There are many use cases for auditd
. For more details I invite you to read my article at
Log shippers
We’ve considered two modern log systems in Linux, which allows transferring data to a centralized storage. These are rsyslog, and journald, they are present by default. They have both pros and cons.
There are many resources, which give information about detailed comparison rsyslogd and journald in terms of remote transferring data, their performance and so on.
But I would like to focus on a new approach to store logs remotely for further analysis. I mean log shippers - lightweight processes which take file logs as an input, process them if necessary, extract required info and/or transform to specific format, then ingest this to a remote storage. A well-known example is filebeat
by Elastic. Filebeat is not the only one, there are many implementations from different developers. If you ask me regarding rsyslog vs journald vs filebeat for transferring messages of a specific application, I’ll reply as follows: “My choice is filebeat”.
In my opinion, it is easier to configure and does only one thing. Of course, it is not a versatile solution, you should find yours for your tasks.
Pros and cons of remote storing
Pros for storing files remotely:
-
Systems don’t spend disk space for log files
-
Storing logs remotely, we can conveniently analyze logs from all servers and build dashboards
-
In such case we prevent logs from being removed accidentally or being made fake
If logs are stored locally:
- Inconvenient to analyze, especially if the system has hundreds of application instances and they are distributed
- There is a risk to remove logs in case if a server is compromised
- Logs create additional load on the system
- If remote server is inaccessible, there is a risk to lose messages (depending on implementation)