TLDR; This blog post will go from explaining what UNIX domain sockets are on how to index being sent to UNIX domain sockets into the Elastic Stack - and what different use-cases exist for this.
UNIX domain sockets - a short history
If you want processes to communicate with each other, you have several possibilities
- Via files, one component writes, the other reads once finished. Requires synchronization (i.e. file locking)
- Signals (
kill -HUP
), does not provide payloads, just a trigger - Shared Memory, Memory Mapped Files
- Network Socket, requires networking
- Unix domain socket, similar to a socket, but not reachable from the outside
From an implementation perspective a UNIX domain socket is a socket, which can
be treated like a file. That makes it easy to implement for consumers and
producers, as you just need to open()
it and write data to it. The
interesting part of writing data is, that you can do as if you would write
to a network socket using the recvmsg()
and sendmsg()
system calls.
As there is no tool like mkfifo
to create a UNIX domain socket, you can
use something like socat to create it,
we’ll do this in a second.
One interesting aspect of creating UNIX domain sockets within the file system are permissions. In order for other processes to send data to its receiver, you need to make sure that the file can be written to - which is easy to check and cater for. And this can be rather easy modeled with group permissions.
Ordering of the data is guaranteed and you can access UNIX domain sockets within sandboxed processes, making it a great use-case for tools like virus scanners - as you can also pass file descriptors the socket.
I don’t use domain sockets - what’s wrong?
Let me assure you, absolutely nothing 😀
All good if you don’t need it. However there might be a couple of use-cases, we will shed some light on. If you have short lived processes, this might be an interesting option for logging. If you used files you would either need to synchronize with other processes to make sure writes are ordered or create a new file for every short-lived process. Think of a short running PHP invocation logging something during its request/response cycle.
This may also be a solution to centralize the logging locally of several components running on a system, before centralizing it, as there is only one component to configure now. Nowadays however, this is less often the case, because you usually go with a one-process-per-container approach like with Docker, where your logging is also just going to stdout and is collected by the docker logging provider.
A few services use UNIX domain services if they don’t need a network connection, for example MySQL, or as already mentioned when processes need to communicate with the web server, for example PHP or uWSGI based applications (common in the Python world).
Today we will focus on the logging use-case.
A simple example
So, let’s get up and running with a logging use case.
First, let’s listen on a UNIX domain socket using socat
.
socat - UNIX-LISTEN:./my-socket.sock
Not much will happen, except that socat process starts running. Let’s take a
look at the my-socket.sock
file
file my-socket.sock
my-socket.sock: socket
So this is a socket, based on my default umask everyone can read that file on that system. Let’s write to it:
echo "Hello Unix Domain Socket" | socat - UNIX-CONNECT:./my-socket.sock
This will show the above message in the terminal and exit socat
. If you
want to keep it around, start the socat listen process with the fork
option
socat UNIX-LISTEN:./my-socket.sock,fork -
In many programming languages connecting to a UNIX domain socket is rather straightforward, take this crystallang example:
require "socket"
sock = UNIXSocket.new(ARGV.shift)
sock.puts ARGV.join " "
sock.close
You can run this via crystal write-to-socket.cr ./my-socket.sock this is a test
and see the data being sent to the socket.
So, back to our logging problem, let’s use Filebeat to read from a UNIX domain socket next…
Configuring Filebeat
Let’s start with a configuration that simply dumps the events, similar to
what socat
did by creating a filebeat-unix-domain-socket.yml
configuration file
filebeat.inputs:
- type: unix
path: "/tmp/socket.sock"
output.console:
pretty: true
Start Filebeat like this:
./filebeat -c filebeat-unix-domain-socket.yml -e
Again use socat
to connect to the UNIX domain socket
echo "Hello Unix Domain Socket" | socat - UNIX-CONNECT:./tmp/socket.sock
This will show an event in the terminal you started Filebeat like this
{
"@timestamp": "2021-07-26T15:24:59.361Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.13.4"
},
"message": "Hello Unix Domain Socket",
"input": {
"type": "unix"
},
"ecs": {
"version": "1.8.0"
},
"host": {
"name": "rhincodon"
},
"agent": {
"ephemeral_id": "8174514b-d651-47cb-8fda-e8998cb6215f",
"id": "9ba7dc8c-a9ce-43a8-8c39-a306060abef5",
"name": "rhincodon",
"type": "filebeat",
"version": "7.13.4",
"hostname": "rhincodon"
}
}
The UNIX input in Filebeat has a couple of more interesting configuration options.
group
for configuring the proper group of the socket for easier access from other applicationsmode
for file system permissions (group must likely be writable if you usegroup
above)socket_type
, one ofstream
anddatagram
delimiter
, if you have your custom data format separator with delimited contentframing
to support fixed width frames to split incoming events rather than a separator
On top of that, you could even use SSL over a UNIX domain socket if you wanted.
One disadvantage of sending text based data is the inability to distinguish data being indexed from different applications, unless you check for the format of a message. A workaround might be to define several inputs in Filebeat, each with its own UNIX domain socket, so you could set custom tags in the input in order to differentiate.
Another alternative is to create required metadata within your application and maybe even sent it as JSON directly. Let’s do this with a final configuration that sends data over to Elasticsearch
filebeat.inputs:
- type: unix
path: "/tmp/socket.sock"
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
- decode_json_fields:
fields: ["message"]
target: ""
expand_keys: true
overwrite_keys: true
output.elasticsearch:
hosts: ["http://localhost:9200"]
Index a document into our index
echo '{"message": "Hello", "first.second.third" : "somevalue"}' \
| socat - UNIX-CONNECT:/tmp/socket.sock
The resulting document will look like this:
{
"_index": "filebeat-7.13.4",
"_type": "_doc",
"_id": "meb75noBCiP7EU2rUrgq",
"_score": 1,
"_source": {
"@timestamp": "2021-07-27T08:01:06.859Z",
"message": "Hello",
"input": {
"type": "unix"
},
"ecs": {
"version": "1.8.0"
},
"host": {
"hostname": "rhincodon",
"architecture": "x86_64",
"name": "rhincodon",
"os": {
"kernel": "20.5.0",
"build": "20F71",
"type": "macos",
"platform": "darwin",
"version": "10.16",
"family": "darwin",
"name": "Mac OS X"
},
"id": "C28736BF-0EB3-5A04-BE85-C27A62C99316",
"ip": [ "192.168.X.Y" ],
"mac": [ "98:ef:98:e5:00:64" ]
},
"agent": {
"id": "9ba7dc8c-a9ce-43a8-8c39-a306060abef5",
"name": "rhincodon",
"type": "filebeat",
"version": "7.13.4",
"hostname": "rhincodon",
"ephemeral_id": "959eb361-07d9-45a1-a748-158f7538922f"
},
"first": {
"second": {
"third": "somevalue"
}
}
}
}
See the overwritten message
field, that only contains the message from the
JSON document as well as the first
field in the end, that now has a proper
JSON structure.
Batching messages via framing
If you want to sent several messages at once, you can use framing. All it
needs is a slight configuration change plus a small modification when
sending messages. So this is the configuration change for the unix
input
filebeat.inputs:
- type: unix
path: "/tmp/socket.sock"
framing: "rfc6587"
Whenever you sent a message now, you need specify the length of the next message followed by a space and then the message itself. Like this:
echo -e "6 Hello18 Hello123" | socat - UNIX-CONNECT:/tmp/socket.sock
This indicates a length of 6
for the first message Hello1
and a length
of 8
for the second message Hello123
. This is the so called
octet counting framing. An alternative to that and also defined in the above
mentioned RFC is non-transparent framing based on delimiters, which is a
newline by default:
echo -e "Hello1\nHello2\n" | socat - UNIX-CONNECT:/tmp/socket.sock
You can also specify a special delimiter, which would allow to send events, that even contain a newline (think of multi line exceptions)
filebeat.inputs:
- type: unix
path: "/tmp/socket.sock"
framing: "rfc6587"
line_delimiter: "\0"
echo -e '{"message":"value\\n\\n\\nvalue"}\0{"message":"value"}\0' \
| socat - UNIX-CONNECT:/tmp/socket.sock
So, also sending several events with a single request works and might speed up your short lived processes.
Summary
Probably 99% of you will not need this. But: Never underestimate the existence of legacy software…
One more important note: Not every logging library properly supports logging via UNIX domain sockets. For example log4j does not seem to be able to log into a UNIX domain socket. I could not find any definite info for log4j2. However, most java programs are not considered short lived, so this may be less of problem.
Java got support for UNIX domain sockets in JEP 380, which was added in Java 16, so rather recently. According to reddit discussions one important feature is missing: the ability to pass file descriptors. There is junixsocket which supports that however.
That said, if you have the possibility to log into files instead of sockets, you might want to go with that. You can replay things easier, if the logging component is not available, as the UNIX domain socket does not vanish. Also make sure, that you test the broken case, where the socket to log to does not exist or is not readable and see what happens within your application.
Do you have an interesting use-case for UNIX domain sockets with Filebeat? If so, please drop me an email or ping me on twitter!
Happy logging!
Resources
- Docs: Filebeat Unix Input
- Julia Evans and her awesome wizard zines have a UNIX domain socket special
- socat tutorial
- socat homepage
- JEP 380
- Talking to Postgres Through Java 16 Unix-Domain Socket Channels by Gunnar Morling
- junixsocket
- Netty Unix domain socket javadocs
- RFC 6587