Author Avatar Image
Alexander Reelsen

Backend developer, productivity fan, likes the JVM, full text search, distributed databases & systems

Indexing Data From Unix Domain Sockets Into The Elastic Stack
Aug 4, 2021
8 minutes read

TLDR; This blog post will go from explaining what UNIX domain sockets are on how to index being sent to UNIX domain sockets into the Elastic Stack - and what different use-cases exist for this.

Unix Domain Socket with Filebeat and Elasticsearch

UNIX domain sockets - a short history

If you want processes to communicate with each other, you have several possibilities

  • Via files, one component writes, the other reads once finished. Requires synchronization (i.e. file locking)
  • Signals (kill -HUP), does not provide payloads, just a trigger
  • Shared Memory, Memory Mapped Files
  • Network Socket, requires networking
  • Unix domain socket, similar to a socket, but not reachable from the outside

From an implementation perspective a UNIX domain socket is a socket, which can be treated like a file. That makes it easy to implement for consumers and producers, as you just need to open() it and write data to it. The interesting part of writing data is, that you can do as if you would write to a network socket using the recvmsg() and sendmsg() system calls.

As there is no tool like mkfifo to create a UNIX domain socket, you can use something like socat to create it, we’ll do this in a second.

One interesting aspect of creating UNIX domain sockets within the file system are permissions. In order for other processes to send data to its receiver, you need to make sure that the file can be written to - which is easy to check and cater for. And this can be rather easy modeled with group permissions.

Ordering of the data is guaranteed and you can access UNIX domain sockets within sandboxed processes, making it a great use-case for tools like virus scanners - as you can also pass file descriptors the socket.

I don’t use domain sockets - what’s wrong?

Let me assure you, absolutely nothing 😀

All good if you don’t need it. However there might be a couple of use-cases, we will shed some light on. If you have short lived processes, this might be an interesting option for logging. If you used files you would either need to synchronize with other processes to make sure writes are ordered or create a new file for every short-lived process. Think of a short running PHP invocation logging something during its request/response cycle.

This may also be a solution to centralize the logging locally of several components running on a system, before centralizing it, as there is only one component to configure now. Nowadays however, this is less often the case, because you usually go with a one-process-per-container approach like with Docker, where your logging is also just going to stdout and is collected by the docker logging provider.

A few services use UNIX domain services if they don’t need a network connection, for example MySQL, or as already mentioned when processes need to communicate with the web server, for example PHP or uWSGI based applications (common in the Python world).

Today we will focus on the logging use-case.

A simple example

So, let’s get up and running with a logging use case.

First, let’s listen on a UNIX domain socket using socat.

socat - UNIX-LISTEN:./my-socket.sock

Not much will happen, except that socat process starts running. Let’s take a look at the my-socket.sock file

file my-socket.sock

my-socket.sock: socket

So this is a socket, based on my default umask everyone can read that file on that system. Let’s write to it:

echo "Hello Unix Domain Socket" | socat - UNIX-CONNECT:./my-socket.sock

This will show the above message in the terminal and exit socat. If you want to keep it around, start the socat listen process with the fork option

socat UNIX-LISTEN:./my-socket.sock,fork -

In many programming languages connecting to a UNIX domain socket is rather straightforward, take this crystallang example:

require "socket"

sock = UNIXSocket.new(ARGV.shift)
sock.puts ARGV.join " "
sock.close

You can run this via crystal write-to-socket.cr ./my-socket.sock this is a test and see the data being sent to the socket.

So, back to our logging problem, let’s use Filebeat to read from a UNIX domain socket next…

Configuring Filebeat

Let’s start with a configuration that simply dumps the events, similar to what socat did by creating a filebeat-unix-domain-socket.yml configuration file

filebeat.inputs:
- type: unix
  path: "/tmp/socket.sock"

output.console:
  pretty: true

Start Filebeat like this:

./filebeat -c filebeat-unix-domain-socket.yml -e

Again use socat to connect to the UNIX domain socket

echo "Hello Unix Domain Socket" | socat - UNIX-CONNECT:./tmp/socket.sock

This will show an event in the terminal you started Filebeat like this

{
  "@timestamp": "2021-07-26T15:24:59.361Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.13.4"
  },
  "message": "Hello Unix Domain Socket",
  "input": {
    "type": "unix"
  },
  "ecs": {
    "version": "1.8.0"
  },
  "host": {
    "name": "rhincodon"
  },
  "agent": {
    "ephemeral_id": "8174514b-d651-47cb-8fda-e8998cb6215f",
    "id": "9ba7dc8c-a9ce-43a8-8c39-a306060abef5",
    "name": "rhincodon",
    "type": "filebeat",
    "version": "7.13.4",
    "hostname": "rhincodon"
  }
}

The UNIX input in Filebeat has a couple of more interesting configuration options.

  • group for configuring the proper group of the socket for easier access from other applications
  • mode for file system permissions (group must likely be writable if you use group above)
  • socket_type, one of stream and datagram
  • delimiter, if you have your custom data format separator with delimited content
  • framing to support fixed width frames to split incoming events rather than a separator

On top of that, you could even use SSL over a UNIX domain socket if you wanted.

One disadvantage of sending text based data is the inability to distinguish data being indexed from different applications, unless you check for the format of a message. A workaround might be to define several inputs in Filebeat, each with its own UNIX domain socket, so you could set custom tags in the input in order to differentiate.

Another alternative is to create required metadata within your application and maybe even sent it as JSON directly. Let’s do this with a final configuration that sends data over to Elasticsearch

filebeat.inputs:
  - type: unix
    path: "/tmp/socket.sock"

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~
  - decode_json_fields:
      fields: ["message"]
      target: ""
      expand_keys: true
      overwrite_keys: true

output.elasticsearch:
  hosts: ["http://localhost:9200"]

Index a document into our index

echo '{"message": "Hello", "first.second.third" : "somevalue"}' \ 
  | socat - UNIX-CONNECT:/tmp/socket.sock

The resulting document will look like this:

{
  "_index": "filebeat-7.13.4",
  "_type": "_doc",
  "_id": "meb75noBCiP7EU2rUrgq",
  "_score": 1,
  "_source": {
    "@timestamp": "2021-07-27T08:01:06.859Z",
    "message": "Hello",
    "input": {
      "type": "unix"
    },
    "ecs": {
      "version": "1.8.0"
    },
    "host": {
      "hostname": "rhincodon",
      "architecture": "x86_64",
      "name": "rhincodon",
      "os": {
        "kernel": "20.5.0",
        "build": "20F71",
        "type": "macos",
        "platform": "darwin",
        "version": "10.16",
        "family": "darwin",
        "name": "Mac OS X"
      },
      "id": "C28736BF-0EB3-5A04-BE85-C27A62C99316",
      "ip": [ "192.168.X.Y" ],
      "mac": [ "98:ef:98:e5:00:64" ]
    },
    "agent": {
      "id": "9ba7dc8c-a9ce-43a8-8c39-a306060abef5",
      "name": "rhincodon",
      "type": "filebeat",
      "version": "7.13.4",
      "hostname": "rhincodon",
      "ephemeral_id": "959eb361-07d9-45a1-a748-158f7538922f"
    },
    "first": {
      "second": {
        "third": "somevalue"
      }
    }
  }
}

See the overwritten message field, that only contains the message from the JSON document as well as the first field in the end, that now has a proper JSON structure.

Batching messages via framing

If you want to sent several messages at once, you can use framing. All it needs is a slight configuration change plus a small modification when sending messages. So this is the configuration change for the unix input

filebeat.inputs:
  - type: unix
    path: "/tmp/socket.sock"
    framing: "rfc6587"

Whenever you sent a message now, you need specify the length of the next message followed by a space and then the message itself. Like this:

echo -e "6 Hello18 Hello123" | socat - UNIX-CONNECT:/tmp/socket.sock

This indicates a length of 6 for the first message Hello1 and a length of 8 for the second message Hello123. This is the so called octet counting framing. An alternative to that and also defined in the above mentioned RFC is non-transparent framing based on delimiters, which is a newline by default:

echo -e "Hello1\nHello2\n" | socat - UNIX-CONNECT:/tmp/socket.sock

You can also specify a special delimiter, which would allow to send events, that even contain a newline (think of multi line exceptions)

filebeat.inputs:
  - type: unix
    path: "/tmp/socket.sock"
    framing: "rfc6587"
    line_delimiter: "\0"
echo -e '{"message":"value\\n\\n\\nvalue"}\0{"message":"value"}\0' \
  | socat - UNIX-CONNECT:/tmp/socket.sock

So, also sending several events with a single request works and might speed up your short lived processes.

Summary

Probably 99% of you will not need this. But: Never underestimate the existence of legacy software…

One more important note: Not every logging library properly supports logging via UNIX domain sockets. For example log4j does not seem to be able to log into a UNIX domain socket. I could not find any definite info for log4j2. However, most java programs are not considered short lived, so this may be less of problem.

Java got support for UNIX domain sockets in JEP 380, which was added in Java 16, so rather recently. According to reddit discussions one important feature is missing: the ability to pass file descriptors. There is junixsocket which supports that however.

That said, if you have the possibility to log into files instead of sockets, you might want to go with that. You can replay things easier, if the logging component is not available, as the UNIX domain socket does not vanish. Also make sure, that you test the broken case, where the socket to log to does not exist or is not readable and see what happens within your application.

Do you have an interesting use-case for UNIX domain sockets with Filebeat? If so, please drop me an email or ping me on twitter!

Happy logging!

Resources


Back to posts