Author Avatar Image
Alexander Reelsen

Backend developer, productivity fan, likes the JVM, full text search, distributed databases & systems

Using seccomp - Making your applications more secure
Oct 27, 2020
21 minutes read

TLDR; This blog post will introduce seccomp and how you can leverage its features in higher level languages, and what you gain from that. It is loosely based on my presentation Seccomp for developers - Making your apps more secure.

Before diving into the gory details let’s start with a longer introduction into seccomp and why it is important for developers to be aware of security features of the operating system in order to not reinvent the wheel in your application code. After that it will get interesting with a few samples how to implement seccomp in various high level programming languages and lastly we will show how to monitor seccomp violations using the Elastic Stack.

Security as a non-functional requirement

No one states this in a software project, but it’s always implicit: all the code you write has to be secure. Because it is implicit only projects in higher security environments state, what this actually means. What does it mean to you? Should your software only be hacked once a year? What are the economic trade offs of investing more in security - can you come up with a number? Are you having external audits of your code - not for liability reasons, but to really increase the security of your code?

The way your software is run, probably also changes the attack vector and your thoughts about security, for example software running locally that cannot be connected to from the internet. You probably want to protect that against inside threats and make sure ACLs work as expected as only company employees can login. It’s different compared to a software like Elasticsearch that runs all across the internet and sometimes even unprotected. Hell, you really want to make sure no one can execute any code from within Elasticsearch. There is also a difference between providing a downloadable and executable package like Elasticsearch and running a software-as-a-service or a web application, where no one can take a look at the source. I will use Elasticsearch quite a bit as an example during this article.

Elasticsearch cannot make any assumptions about the runtime environment and how secure it is. Being a Java product you cannot even assume the operating system or its configuration. If Elasticsearch relied on AppArmor or SELinux as a security feature, what would happen is neither of those were available? Should Elasticsearch cease to start?

Elasticsearch also relies on more than one layer to improve security by using the Java Security Manager as well as seccomp. If you want know more about that take a look my blog post about securing a search engine while maintaining usability.

What is seccomp?

The question is not so much, what seccomp is, but what problem seccomp is trying to solve. Seccomp tries to limit the scope of what a program can execute by limiting the syscalls that it is allowed to run. It’s not trying to virtualize anything or run code in a sandbox, it allows or rejects parts of a program. In short, this allows you to

Run untrusted code in your system

If you do not know, what code is doing, but you still would like to be able to run it, you should be able to sandbox it, and define yourself what is OK or not OK to run. A modern use-case would be your browser. You are downloading JavaScript all the time from every website you visit, without even looking at it - yet, you would like to execute locally in a safe way. This is the reason browsers have their own JavaScript engines, trying to ensure, no files can be opened or no network requests can be made across the internet.

Many moons ago, in 2005, the Linux kernel added the first capability of restricting what a process can do. Back then you could enable this by echoing a value into a file in the proc file system within the process. From then on, only read, write, sigreturn (or rt_sigreturn) and exit could be used. This meant, that sockets needed to be opened already so one could read from it. The initial idea of the author was to rent out CPU cycles after securing a program that way. I don’t remember that this idea ever had any success.

This functionality was at some point implemented as a prctl call like


which came with the same limitations.

In 2012 things moved forward again with a major change. By allowing more fine grained and custom configurations via a BPF filter, seccomp users could create their own policies to filter syscalls. This was a game changer. By using BPF, the berkeley packet filter, users could now filter and syscalls and their arguments.

So, why was BPF used? Well, if you are reading this now, you may have heard that BPF has pretty much permeated the kernel space in 2020 allowing you many things that were considered really hard a few years ago. The possibility of creating dynamic programs, that are running secure in the kernel space, has been an absolute game changer for many use-cases, seccomp being only one of many. There is a great blog post by Glauber Costa explaining how io_uring and eBPF will revolutionize programming in Linux.

So, why is BPF so great? You may remember BPF from tools like tcpdump or wireshark, where the expression that you used to filter in a PCAP dump was actually a berkeley packet filter. Having this filtering ability in kernel space is even better, as in the case of seccomp this prevents a performance decrease, if you had to execute these kind of security checks in the user space. By running these in the kernel space, you can have a really fast check if a system call is supposed to be executed.

By running custom code in the kernel space, debugging, monitoring and Observability (there, I said it!) will become different over the next months, and I expect the leading companies in that space to jump on that wagon pretty soon (few companies have been hiring folks in that space very recently or published open source libraries). There are already tools like bcc out there allowing you do Linux I/O analysis plus networking and monitoring.

But let’s get back to seccomp! So after allowing to add a seccomp filter to the prctl(), from Linux 3.17 on wards there is a dedicated seccomp() syscall as well. With the dedicated syscall this looked like


So, how does this work from a process perspective? Basically, the process tells the operating system to limit its own abilities by applying a seccomp-bpf filter with a list of blocked or allowed system calls and a default policy. A management process like systemd could do the same.

A few common seccomp users are:

There are many many more though!

Using seccomp with firejail

Let’s demo a simple example. If you want to follow along, but you are not running a Linux system, check out my github repository, which brings up a vagrant VM, that you can SSH into and test everything.

git clone
cd seccomp-samples
# aaaaand wait a bit
vagrant up
# ssh into it
vagrant ssh

This will install a Debian Linux VM, install a couple of dependencies as well as Elasticsearch, Kibana and auditbeat. We will get to that later!

From now on, I assume with every command, that you are running under Linux or that VM including the code samples which are part of that GitHub repository. Time to run a command to see seccomp in action. firejail is a great helper for this. Firejail is sandboxing tool not only utilizing seccomp-bpf but also Linux namespaces. For now, we take a look at the seccomp feature by running:

firejail --noprofile --seccomp.drop=bind -c strace nc -v -l -p 8000

So, this tries to run netcat under at port 8080. However the strace output looks like this:

setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("")}, 16) = ?
+++ killed by SIGSYS +++

As firejail was instructed to drop bind() calls via seccomp, this is exactly what happened and the process was killed when trying to bind to the port.

In order to not miss such a seccomp violation, a message has been written to the kernel, which we can take a look at via dmesg.

Parsing the audit output

You will see something like this

[   54.372855] audit: type=1326 audit(1603120490.865:8): auid=1000 uid=0
    gid=0 ses=3 subj==unconfined pid=1376 comm="nc"
    exe="/usr/bin/nc.traditional" sig=31 arch=c000003e syscall=49 
    compat=0 ip=0x7f9ad10d7497 code=0x0

You can also use the ausearch tool, which is part of the auditd userspace tools.

/usr/sbin/ausearch --syscall bind
time->Mon Oct 19 15:17:23 2020
type=SECCOMP msg=audit(1603120643.421:21): auid=1000 uid=0 gid=0 ses=3 
  subj==unconfined pid=1630 comm="nc" exe="/usr/bin/nc.traditional"
  sig=31 arch=c000003e syscall=49 compat=0 ip=0x7fbca3dc1497 code=0x0
  • type: type of event
  • msg: timestamp and uniqueid (can be shared among several records)
  • auid: audit user id (kept the same even when using su -)
  • uid: user id
  • gid: group id
  • ses: session id
  • subj: SELinux contest
  • pid: process id
  • comm: commandline name
  • exe: path to the executable
  • sig: 31 aka SIGSYS
  • arch: cpu architecture
  • syscall: syscall (49 is bind()), see ausyscall --dump
  • compat: syscall compatibility mode,
  • ip: IP address
  • code: seccomp action

While I’m sure you found this list as spectacular as I did, we should probably find an easier way to take a look this data, than CLI tools - especially at scale with many systems involved. I’m quite sure you get where this is going in combination with the Elastic Stack and we’ll get to that later. 😀

How about after all this fancy introduction we take another step back and look at this blog post from high above. The question potentially lingering in your head is, WHY the hell am I writing all of this. Well, how about you start considering the following

Your code is untrusted code

And therefore your code should also use something like seccomp, if possible. As software developer we tend to think of our code as secure, once it passes all the tests, all its bad cases, all the negative numbers and lizards you passed into it. But what if someone finds a security issue/an attack vector you have not thought about at all?

Let’s do an artificial example from my grey-bearded generation, who grew up with cgi-bin and FTP uploads of non-fancy Perl & PHP scripts into a directory - if you didn’t understand this, read on and ignore me ranting about the past.

Let’s imagine having a Perl script, that takes an IP address as an argument like this:

perl -e 'print `ping -c 1 $ARGV[0]`'

That snippet doesn’t even look innocent at first sight, because the back ticks are spawning a sub shell. Let’s have some more fun

perl -e 'print `ping -c 1 $ARGV[0]`'
perl -e 'print `ping -c 1 $ARGV[0]`' " ; ls -al"
perl -e 'print `ping -c 1 $ARGV[0]`' " || ls -al"
perl -e 'print `ping -c 1 $ARGV[0]`' " && ls -al"
perl -e 'print `ping -c 1 $ARGV[0]`' " -c 100000"
perl -e 'print `ping -c 1 $ARGV[0]`' " -c 100000 > /tmp/foo"

As you can see, this is not even trying hard to be exploited. All of these could have been caught by proper input validation. Trying to check for a valid host name as argument. You could get more fancy with funny DNS records, but let’s not get there. Still there are different attack vectors: Executing other commands as well as DoS possibilities by long running execution along with overflowing disk space.

Also there is one more detail that you should not forget about: the ping binary. Once you incorporate it in your code, you own it, and all of its security issues. On older Linux distributions this used to have the setuid flag set, which meant it was executed as root. Newer distributions tend to use a different ping binary, that is not dependent on that flag. But still the cost of ownership should not be underestimated.

Try not to own other programs, unless you have audited them.

So, now we have established why even reducing the privileges of the code you write makes a lot of sense. Once this is understood, let’s return to seccomp.

There is an easy way to take a look, which processes make use of seccomp:

$ grep Seccomp /proc/*/status | grep -v 0 | cut -d '/' -f3

OK, sooooo, let’s figure out the processes

ps hww $(grep Seccomp /proc/*/status | grep -v 0 \
  | cut -d '/' -f3 | tr '\n' ' ' )

The output looks like this

  219 ?        Ss     0:00 /lib/systemd/systemd-journald
  245 ?        Ss     0:00 /lib/systemd/systemd-udevd
  381 ?        Ss     0:00 /lib/systemd/systemd-logind
  384 ?        Ssl    2:13 /usr/share/elasticsearch/jdk/bin/java ...
 1345 ?        Ssl    0:23 /usr/share/auditbeat/bin/auditbeat ...

The elasticsearch/auditbeat process line was cut a little to fit the formatting, usually you would see all the arguments when running that ps call. This shows that systemd, Elasticsearch and auditbeat (both part of the Elastic Stack) make use of seccomp. The number 2 as part of the output in the /proc/[pid]/status file indicates that a seccomp filter has been set - this does not mean it’s a good filter, just that setting it was successful.

So, what makes a seccomp filter?

Using seccomp filters

A seccomp filter is a set of rules, running in the kernel space, where every system call is checked against. Written in BPF allows the filter to have a couple of capabilities like allowing for no loops or the ability to jump backwards, so that every program has a defined - it’s basically a DAG (directed acyclic graph).

Each system call is checked against that filter, and there are a few possible outcomes.

  • a system call is allowed
  • the thread or even the whole process is killed
  • the event is logged
  • an error is returned to the caller

I will not cover, how a seccomp filter looks in C, there are plenty of resources for this, so we will take a direct look at the implementation of Elasticsearch, which is written in Java.


All the seccomp logic is written in a single class in Elasticsearch, named SystemCallFilter. This class not only supports seccomp under Linux, but also more protection mechanisms from other operating systems in order to prevent the ability to fork or run a process.

While seccomp is much more powerful than what Elasticseach does, the idea within Elasticsearch was to only reject some system calls. For other means of protection the Java Security Manager is used like not being able to read arbitrary files for example.

In order to execute system calls like prctl() and seccomp Elasticsearch uses a library called JNA.

Using JNA you can write rather low level java code, that looks a bit like C. Take a look at this BPF filter

// BPF installed to check arch, limit, then syscall.
// See for details.
SockFilter insns[] = {
  // if (arch != audit) goto fail;
  BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K,   arch.audit,     0, 7),
  // if (syscall > LIMIT) goto fail;
  BPF_JUMP(BPF_JMP + BPF_JGT + BPF_K,   arch.limit,     5, 0),
  // if (syscall == FORK) goto fail;
  BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K,   arch.fork,      4, 0),
  // if (syscall == VFORK) goto fail;
  BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K,   arch.vfork,     3, 0),
  // if (syscall == EXECVE) goto fail;
  BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K,   arch.execve,    2, 0),
  // if (syscall == EXECVEAT) goto fail;
  BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K,   arch.execveat,  1, 0),
  // pass: return OK;
  BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),                              
  // fail: return EACCES;
// seccomp takes a long, so we pass it one explicitly to keep the JNA simple
SockFProg prog = new SockFProg(insns);
long pointer = Pointer.nativeValue(prog.getPointer());

What this basically states is, that if one of the fork or exec system calls is executed, the seccomp filter should return EACCES and not allow those, but keep Elasticsearch up and running - which makes sense to keep servicing requests, instead of killing the process.

So, how to install the above policy? Elasticsearch tries this in two ways, first with the seccomp syscall and if that fails with the prctl syscall in order to support older kernels - something that was needed back then when this feature was added due to the LTS kernels of some distributions. I removed a little bit of code to make it more readable, you can check the full version here.

// check for GET_NO_NEW_PRIVS
switch (linux_prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0)) {

 // check for SECCOMP
switch (linux_prctl(PR_GET_SECCOMP, 0, 0, 0, 0)) {

if (linux_prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, 0, 0, 0) != 0) {

// ok, now set PR_SET_NO_NEW_PRIVS
// needed to be able to set a seccomp filter as ordinary user
if (linux_prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) != 0) {
    throw new UnsupportedOperationException(...);

// check it worked
if (linux_prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0) != 1) {
    throw new UnsupportedOperationException(...);

// if seccomp systel does not work, use prctrl
if (linux_syscall(arch.seccomp, SECCOMP_SET_MODE_FILTER,
  SECCOMP_FILTER_FLAG_TSYNC, new NativeLong(pointer)) != 0) {
    if (linux_prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, pointer, 0, 0) != 0) {

// check for seccomp policy
if (linux_prctl(PR_GET_SECCOMP, 0, 0, 0, 0) != 2) {

And that’s about it. The whole class taking care of five different implementations for different operating systems to limit executions is about 600 lines long. This means, there basically is no excuse not to add this kind of feature to your own code from a complexity perspective!


The author of libseccomp also provides a go library. Elastic has released a go-seccomp-bpf library to load seccomp profiles based on a YAML configuration file. That library also ships with a profiler to automatically generate a whitelist.

Each beat ships with its own seccomp policy, that can be tweaked if needed, but there is a default policy that looks like this

package seccomp

import (

func init() {
  defaultPolicy = &seccomp.Policy{
    DefaultAction: seccomp.ActionErrno,
    Syscalls: []seccomp.SyscallGroup{
        Action: seccomp.ActionAllow,
        Names: []string{


Now on to one of my favorite examples - mainly because I really like Crystal as a language and the idea behind it. In case you don’t know, Crystal is a relatively new programming language, that looks a little bit like ruby, but compiles down to an optionally static binary using LLVM.

I will not delve into details how to setup a crystal project, but there is a seccomp library for Crystal available. Even though it is already a little bit older, you can use it to get up and running.

So, let’s take a look at the seccomp policy setup within Crystal

SCMP_ACT_LOG = 0x7ffc0000
class SeccompClient < Seccomp

  def initialize(@log : Bool)

  def run : Int32
    ctx = uninitialized ScmpFilterCtx

    ctx = seccomp_init(SCMP_ACT_ALLOW)

    # stop executions
    action = @log ? SCMP_ACT_LOG : SCMP_ACT_ERRNO
    seccomp_rule_add(ctx, action, seccomp_syscall_resolve_name("execve"), 0)
    seccomp_rule_add(ctx, action, seccomp_syscall_resolve_name("execveat"), 0)
    seccomp_rule_add(ctx, action, seccomp_syscall_resolve_name("fork"), 0)
    seccomp_rule_add(ctx, action, seccomp_syscall_resolve_name("vfork"), 0)

    ret = seccomp_load(ctx);

    # optional, dump policy on stdout
    #ret = seccomp_export_pfc(ctx, STDOUT_FILENO)
    #printf("seccomp_export_pfc result: %d\n", ret)
    ret < 0 ? -ret : ret

In order to set this up, the client needs to be initialized in your code

if seccomp

The sample application is a small webserver trying to run /bin/ls, if the webserver is started with seccomp enabled via ./bin/webserver -s, then this exception will be thrown in the logs

root@contrib-buster:/vagrant/crystal-seccomp# ./bin/webserver -s
Listening on
Incoming request: path /
2020-10-20T11:35:07.028709Z  ERROR - http.server: Unhandled exception on HTTP::Handler
Error executing process: '/bin/ls': Operation not permitted (IO::Error)
  from Exception::CallStack::unwind:Array(Pointer(Void))


The python seccomp library is also maintained by Paul Moore, the libseccomp maintainer, which means it is usually up to date.

def setup_seccomp(log):
    f = SyscallFilter(ALLOW)
    action = LOG if log else ERRNO(errno.EACCES)
    # stop executions
    f.add_rule(action, "execve")
    f.add_rule(action, "execveat")
    f.add_rule(action, "vfork")
    f.add_rule(action, "fork")
    print(f'Seccomp enabled...')

As you can see, there is no need to look up the system call names, as the python library is doing that for you.

So, both code snippets either log a seccomp violation or return an error, so that the caller knows that the operation was not executed successfully.

This however means, that if a seccomp violation has occurred, it is not logged. Let’s fix that for the python implementation

def setup_seccomp(log_only):
    f = SyscallFilter(ALLOW)
    # always log, even when returning an error
    f.set_attr(Attr.CTL_LOG, 1)
    action = LOG if log_only else ERRNO(errno.EACCES)
    # stop executions
    f.add_rule(action, "execve")
    f.add_rule(action, "execveat")
    f.add_rule(action, "vfork")
    f.add_rule(action, "fork")
    print(f'Seccomp enabled...')

Now run the webserver via python3 and trigger another exception by running curl localhost:8081. You should now see the operation being rejected and running /usr/sbin/ausearch --syscall execve will return the seccomp violation event.

I also added a logging only mode to both applications, where the execution of /bin/ls still works in the python or Crystal code, but it is still logged.

This is a powerful mechanism for running in a transition mode, where your existing application can be run with a seccomp filter enabled, to see if running your application will trigger any violations. If it doesn’t you can just enable your seccomp policy after some time running in production. This way you can also retrofit a seccomp policy easily to existing applications.

Monitoring seccomp violations

Before we finish this article, let’s take a look at how to monitor seccomp violations. You already saw the ausearch command line utility, which is great for searching for seccomp violations. However, what about running this at a bigger scale?

Remember, how the vagrant image also ships with Elasticsearch, Kibana and auditbeat? Well, because of that. auditbeat logs seccomp violations and sends them to Elasticsearch. Using Kibana you can get a neat dashboarding overview of your seccomp violations.

Auditbeat is already running on the vagrant VM, so it should have caught all the violations that you triggered so far. You can open http://localhost:5601, click on Kibana > Dashboard and then select the [Auditbeat Auditd] Overview ECS dashboard.

Your dashboard will look like this

Kibana Auditbeat Dashboard

If you are interested how the JSON is looking like, you can run this search for a seccomp policy violation in the Kibana Dev Tools

GET auditbeat-*/_search
  "size": 1,
  "query": {
    "term": {
      "event.action": {
        "value": "violated-seccomp-policy"
  "sort": [
      "@timestamp": {
        "order": "desc"

Let’s take a quick look at a few parts of such a document

  "@timestamp" : "2020-10-20T12:45:45.311Z",
  "ecs" : {
    "version" : "1.5.0"
  "host" : {
  "agent" : {
  "event" : {
    "action" : "violated-seccomp-policy"
  "user" : {
  "process" : {
    "pid" : 2097,
    "name" : "",
    "executable" : "/usr/bin/python3.7"
  "auditd" : {
    "summary" : {
      "object" : {
        "primary" : "59",
        "type" : "process"
      "how" : "",
      "actor" : {
        "primary" : "vagrant",
        "secondary" : "vagrant"
    "message_type" : "seccomp",
    "sequence" : 325,
    "result" : "unknown",
    "data" : {
      "sig" : "0",
      "compat" : "0",
      "syscall" : "59",
      "code" : "0x50000",
      "ip" : "0x7f5ce5131a07",
      "arch" : "c000003e"
    "session" : "2"
  "service" : {
    "type" : "auditd"

Let’s skip the generic information from host, agent, event and the user triggering this action. The process field contains information about the process triggering the violation, but the most interesting part is probably within, as this allows you to figure out the syscall.

In case you are wondering about the structure of the JSON, this basically adheres to the Elastic Common Schema, a specific way of writing event data to Elasticsearch.


Almost done! Let’s summarize what you hopefully learned during this article. First seccomp is a great and battle tested mechanism. It’s Linux only, but if you only need to prevent the execution of other processes, other operating systems have similar features. That said, due to the power of BPF seccomp filters can be really powerful.

Integrating seccomp into your application can be done in C of course, but I hope you got a glimpse that other more or less high level languages also allow you do this. Nothing to be afraid of.

Apart from python, Crystal, Java, Go there are also packages in Rust and even Perl. There are some node packages, but I am not sure how well maintained those. Ruby does not seem to have any packages at the moment.

Even if there is no support for your programming language, you can still use something like firejail and provide a profile for your application, but

Integrate seccomp natively in your application

From a developer perspective there are a couple of advantages in doing so. First, there is no way of disabling this. This way, you always know that this security feature is enabled. Again, when running in environments that you do not control, this will come in really handy. In case the seccomp filter cannot be installed, you can always abort the start up of your application.

In case I repeat myself and the other millions of people on the internet

Do not roll your own security

Of course you could have a fancy idea of creating a Java agent, that wraps around the Runtime class and ensures no code gets executed. But if you can achieve the same with seccomp, it probably makes sense not to reinvent the wheel and consider yourself smarter than anyone trying to hack your code.

At the end you may want to rethink your design as well. If you need to call binaries in your application, maybe it makes sense to have another daemon with different seccomp policies calling this binary and use a unix domain socket to communicate. This might make your architecture more complex, especially from an operational perspective and you have to decide if the additional security is worth it. I hope your answer is mostly yes on that one.


Final remarks

If you made it down here, wooow! Thanks for sticking with me. You can follow or ping me on twitter, GitHub or reach me via Email (just to tell me, you read this whole thing :-).

If there is anything to correct, drop me a note, and I am happy to do so and append to this post!

Same applies for questions. If you have question, go ahead and ask!

If you want me to speak about this, drop me an email!

Back to posts