Author Avatar Image
Alexander Reelsen

Backend developer, productivity fan, likes the JVM, full text search, distributed databases & systems

Writing a plugin for the Elastic APM Java Agent
Jul 27, 2021
16 minutes read

TLDR; This blog post goes through the process of creating an APM agent plugin in order to support more frameworks within the Elastic Java Agent.

Introduction into APM

All of this started with me writing a Javalin based Java web application. After getting it up and running properly I needed to figure out how to monitor it, and I started taking a look at different monitoring agents. One of my motivations was to take a look at the Elastic competition - always stay curious!

Just to explain: When you want to monitor your application and not only the metrics and logs it emits, you need to instrument it. Instrumenting means, that you might be able to analyze the run times of different parts of application, for example monitoring database queries or invocations of HTTP requests and responses. This is known as APM - application performance management - even though this is mainly monitoring.

I was however pretty disappointed by the whole java agent Observability landscape. I would not have expected the self proclaimed thought leaders to have pretty bare bones java agents - up to the point that there is no automatic instrumentation at all, with the exception of Spring Boot applications.

From a company perspective it makes sense to support Spring Boot first and then think carefully about the next choice, because it is so much more popular than anything else. However, if you proclaim yourself to be a leader, you may want to do more than just implementing existing spring boot interfaces for JDBC queries.

This is where instrumentation via agents comes into play. Java has a standardized interface to modify the code of an application before it is executed (or even while the application is running via attaching such an agent). This is a lot better than with many other programming languages, where you kind of monkey patch the runtime to add APM support.

Transaction and span structure

Still, this implies, that you figure out where and how to hook into an application, in order to trace the execution of a database query for example. With such a great power comes indeed great responsibility: You have the full power to do something wrong here and screw up the whole application and bring it into a corrupted state or even replace data before it is persisted and bring your database in a bad state.

This means, a java agent needs to do its work as unobtrusive as possible, while still supporting many frameworks to properly create spans and transactions for code being executed. The overhead compared to running the same application without an agent also needs to be as small as possible (runtimes, garbage collections, etc).

Because of that power of modifying the user’s application code - a big difference compared to just collecting logs and metrics - most of those agents are open source, so that users can figure out what is going on.

That said, don’t underestimate the complexity of such agents, what kind of hoops they have to jump through to not change the classloading behavior of applications and not to interfere with the regular execution. I completely underestimated that, as I did not have any prior experience.

But back to my application and my willingness to properly monitor it. After taking a look at some agents I was pretty disappointed.

Javalin is based on Jetty - a common servlet based web server - and itself also had 600k downloads in the first half of this year, almost 5k stars on GitHub. It’s not big, but not super small either, yet the only ability to gather some instrumentation stats that I could find were instrumentations of the underlying Servlet API, where one could not even differentiate between different endpoints, but only HTTP methods. So, one basically had transactions for GET and POST methods, but not even for individual endpoints, as the servlet API basically has a service method with the request and the response as arguments and that’s it.

That is not acceptable for a proper overview. So after trying four different agents I figured out it is time to get my hands dirty and create a proper instrumentation for the Elastic agent.

My goal for proper instrumentation was:

  • Set the transaction name to the HTTP endpoint like GET / or POST /app/:organization/update - where the organization here is a dynamic path parameter
  • Instrument each handler as its own span. This means any before/after handler should be its own span as well

This is basically the way I would like to see it in the Elastic APM UI

Sample Transaction with Spans

Our first step is to figure out, what exactly needs to be instrumented.

Understanding Javalin Request Execution

Before we delve into the details of Javalin, let me explain why I like it so much.

First, very few dependencies. Second, no magic. This sounds trivial, but there are so many frameworks doing fancy things at build time (Quarkus, Micronaut) or at runtime (Spring), that I find the idea of having a framework I can step debug through just insanely appealing. And yes, I know that I may leave some performance on the table, but that’s a good deal for developer productivity. Third, I enjoy its start up times, being usually less than 100ms unless everything is ready. I do run two Spring Boot applications in production, that require several seconds on my notebook. While I am sure, this could be tuned, I enjoy the out-of-the-box speed.

The good part is, that writing such an instrumentation implies, that you have understood the framework part. With Javalin this means digging through some Kotlin code - which I am admittedly a little rusty on.

Every HTTP invocation in Javalin is done via a Handler functional interface. Instrumenting this means, figuring out when Handler.handle is executed and wrapping span around the execution. Let’s take a look at the agent API on how we can do this.

The Elastic Agent API

Each instrumentation requires an definition when and how it should be executed, even before what should be done during that instrumentation.

Let’s take a look at the JavalinInstrumentation class

public class JavalinInstrumentation extends TracerAwareInstrumentation {

    @Override
    public ElementMatcher<? super TypeDescription> getTypeMatcher() {
        return hasSuperType(named("io.javalin.http.Handler")).and(not(isInterface()));
    }

    @Override
    public ElementMatcher.Junction<ClassLoader> getClassLoaderMatcher() {
        return classLoaderCanLoadClass("io.javalin.http.Handler");
    }

    @Override
    public ElementMatcher<? super MethodDescription> getMethodMatcher() {
        return named("handle").and(takesArgument(0, named("io.javalin.http.Context")));
    }

    @Override
    public Collection<String> getInstrumentationGroupNames() {
        return Collections.singleton("javalin");
    }

    @Override
    public String getAdviceClassName() {
        return "co.elastic.apm.agent.javalin.JavalinInstrumentation$HandlerAdapterAdvice";
    }

The definitions of those methods are done in the ElasticApmInstrumentation class, also including some javadocs for your reading pleasure.

The basic idea is to figure out as efficient as possible, if the instrumentation needs to be applied. The first test is, if the Handler class can be loaded in the getClassLoaderMatcher() method. The combination of getTypeMatcher() and getMethodMatcher() define, which classes and which method within that class should be instrumented. Lastly, the instrumentation class is specified via getAdviceClassName() that specifies the advice to run. In this example it is an inner class within the same class. The class looks like this

public static class HandlerAdapterAdvice {

    @Nullable
    @Advice.OnMethodEnter(suppress = Throwable.class, inline = false)
    public static Object setSpanAndTransactionName(@Advice.This Handler handler,
                                                   @Advice.Argument(0) Context ctx) {

    }


    @Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class, inline = false)
    public static void onAfterExecute(@Advice.Enter @Nullable Object spanObj,
                                      @Advice.Argument(0) Context ctx,
                                      @Advice.Thrown @Nullable Throwable t) {

    }
}

This advice executes two methods. The first OnMethodEnter is being run before the method itself is run, the second after - so we can neatly wrap the Handler.execute() method. Before diving into details, we need to expose some more information on the Javalin side.

Extending Javalin

Javalin’s Handler class can also be used as a pre or post hook before something is processed. This might be useful to check if the user is logged in already or if you want to execute some code after processing the request.

Each handler is registered with a handler type, a type can be one of

GET, POST, PUT, PATCH, DELETE, HEAD, TRACE, CONNECT, OPTIONS, BEFORE, AFTER, INVALID;

So this contains all the HTTP methods, plus three special ones at the end. Unfortunately I was not able to access the handler type within the javalin Context, that is the argument for each handler, as it is not exposed via the Java API. This would mean I could not properly configure a span name to mark a handler as a before or after handler. Instead of doing byte code magic I figured it might be easier to fix this upstream and went ahead and did it in a pull request.

With the handler type exposed it was easy to come up with the following scheme for transaction and span names.

Transaction and span structure

APM Agent Check For Recent Javalin Version

Now that new Javalin versions were released with the handler type exposed, there had to be some kind of check if instrumentation should happen. This was the most delicate part as I did not have any clue what needs to be checked for and you do not want to introduce slow parts to this, for example by using reflection.

@Nullable
@Advice.OnMethodEnter(suppress = Throwable.class, inline = false)
public static Object setSpanAndTransactionName(@Advice.This Handler handler,
                                           @Advice.Argument(0) Context ctx) {

    HandlerType handlerType = getHandlerType(ctx);

    if (handlerType == null) {
        return null;
    }

    ...

The magic happens in the getHandlerType() method. If that one returns null, then an older Javalin version is used. The core of this method is the following line of code:

handlerTypeMethodHandle = MethodHandles.lookup().findVirtual(
    context.getClass(), 
    "handlerType", 
    MethodType.methodType(HandlerType.class)
);

MethodHandles is a feature introduced in Java 1.7 to work alongside the reflection API, but also considered as a more powerful alternative.

The above code tries to figure out if the Context class, has a method named HandlerType, that returns a HandlerType. In that case we know, that Javalin supports what we need for our instrumentation.

Feel free to check the full implementation of the getHandlerType() method in the source code.

The next step is to figure out if the transaction name should be set

Figuring Out The Transaction Name

Going back to our image, we can only infer the transaction name, once the code in a handler from a HTTP method is executed.

Transaction and span structure

Luckily there is a helper method we can use in the HandlerType code, so setting the transaction name looks like this:

if (handlerType.isHttpMethod()) {
  final StringBuilder name = 
      transaction.getAndOverrideName(PRIO_HIGH_LEVEL_FRAMEWORK, false);
  if (name != null) {
    transaction.setFrameworkName(FRAMEWORK_NAME);
    transaction.setFrameworkVersion(
          VersionUtils.getVersion(Handler.class, "io.javalin", "javalin"));
    transaction.withType("request");

    name.append(handlerType.name()).append(" ")
      .append(ctx.endpointHandlerPath());
  }
}

Transaction names have a priority set to them. The idea of this is, that even if another instrumentation already set a transaction name, those with a higher priority will overwrite it. This makes a lot of sense, as in this example the servlet instrumentation already set a transaction name, but due to the higher priority of Javalin as a framework using the PRIO_HIGH_LEVEL_FRAMEWORK variable, the name gets overwritten.

In addition the transaction name gets set based on the handler type (which is a HTTP verb, because of the initial if check) and the endpoint handler path. That endpoint handler path is not the concrete path, but the one the handler had been registered with.

This way the transaction name is set. Now we need to create the spans.

Creating Spans

Spans are created for every handler, independent from if a new transaction name had been set earlier.

// create own span for all handlers including after/before
final AbstractSpan<?> parent = tracer.getActive();
if (parent == null) {
    return null;
}

Span span = parent.createSpan().activate();
span.withType("app").withSubtype("handler");
span.appendToName(handlerType.name())
    .appendToName(" ")
    .appendToName(ctx.matchedPath());
return span;

This is rather straight forward to read, a span gets created and set up with a little metadata, which makes it easier to search for. The interesting part is the return statement, as the created span is returned. The reason for this is the ability to use that span in the second method of this advice, when the execution of the handler is finished.

Finishing Each Span

Once the handler is done, we need to make sure we close the span, so we know its execution has finished. This is done via the second method that is annotated with @Advice.OnMethodExit:

@Advice.OnMethodExit(...)
public static void onAfterExecute(@Advice.Enter @Nullable Object spanObj,
                                  @Advice.Argument(0) Context ctx,
                                  @Advice.Thrown @Nullable Throwable t) {
  if (spanObj != null) {
    final Span span = (Span) spanObj;
    span.deactivate();

    final CompletableFuture<?> responseFuture = ctx.resultFuture();
    if (responseFuture == null) {
      // sync request
      span.captureException(t).end();
    } else {
      responseFuture.whenComplete(
        (o, futureThrowable) -> span.captureException(futureThrowable).end());
    }
  }
}

There are a couple of interesting things here. First the argument annotated with @Advice.Enter is the span that was created in the previous method. Second the Context passed to Handler.handle() method is the second argument annotated via @Advice.Argument. This context is needed to figure out if the result was a future. The instrumentation needs two ways of ending the span. The easy way is as part of a synchronous request, where one can call Span.end() as part of this code. The more complex way is, in case a result future was set in the handler. In that case we need to wait for the future to finish and then call Span.end(). Luckily Javalin uses a CompleteableFuture, so it is easy to attach to its life cycle.

Wrapping Lambdas

Javalin allows to use lambdas when registering handlers. The code looks like this

Javalin app = Javalin.create();
app.get("/", ctx -> ctx.status(200).result("Hello World"));

The Handler interface is marked as a @FunctionalInterface, which is great to write short programs, but introduces a new issue: Lambdas are not instrumented at all!

So what can be done about this? Even though it’s not super nice, there is a workaround for this: Wrapping the lambda into another handler. To do this, a second instrumentation has been done, that wraps a lambda in another class. This needs to be done at a different stage than the other instrumentation:

public class JavalinHandlerLambdaInstrumentation extends TracerAwareInstrumentation {

    @Override
    public ElementMatcher<? super TypeDescription> getTypeMatcher() {
        return named("io.javalin.Javalin");
    }

    @Override
    public ElementMatcher<? super MethodDescription> getMethodMatcher() {
        return named("addHandler")
            .and(takesArgument(0, named("io.javalin.http.HandlerType")))
            .and(takesArgument(1, String.class))
            .and(takesArgument(2, named("io.javalin.http.Handler")))
            .and(takesArguments(4));
    }

    @Override
    public Collection<String> getInstrumentationGroupNames() {
        return Collections.singleton("javalin");
    }

    @Override
    public ElementMatcher.Junction<ClassLoader> getClassLoaderMatcher() {
        return classLoaderCanLoadClass("io.javalin.http.Handler");
    }

    @Override
    public String getAdviceClassName() {
        return "co.elastic.apm.agent.javalin.JavalinHandlerLambdaInstrumentation$HandlerWrappingAdvice";
    }

    /* ... */

Some checks are the same, as the instrumentation should only be done, if the Handler class is loaded. The method to instrument is a bit different. In this case it is the io.javalin.Javalin class and within that the addHandler() method, but a specific one taking four arguments, and defining the types of the first three.

This is the method definition in the Javalin class:

public Javalin addHandler(@NotNull HandlerType handlerType,
                          @NotNull String path,
                          @NotNull Handler handler,
                          @NotNull Set<Role> roles) {

This method basically adds a handler to the list of handlers to process. So, if we figure out at this stage if the passed handler is a lambda, we can wrap. The implementation for this is rather short:

public static class HandlerWrappingAdvice {

    @Nullable
    @AssignTo.Argument(2)
    @Advice.OnMethodEnter(inline = false)
    public static Handler beforeAddHandler(@Advice.Argument(2) @Nullable Handler original) {
        if (original != null && original.getClass().getName().contains("/")) {
            return new WrappingHandler(original);
        }
        return original;
    }
}

The annotation defines, that the returning argument should be assigned to the second parameter of the matched method. If a slash is occurring in the class name we assume a lambda had been defined and thus return a WrappingHandler which looks like this:

class WrappingHandler implements Handler {

    private final Handler wrappingHandler;

    public WrappingHandler(Handler wrappingHandler) {
        this.wrappingHandler = wrappingHandler;
    }

    @Override
    public void handle(@Nonnull Context ctx) throws Exception {
        wrappingHandler.handle(ctx);
    }
}

This one took me a while to figure out, but works well.

Supporting Javalin 4

Let’s go back to one snippet from above, the method matcher for the lambda instrumentation

@Override
public ElementMatcher<? super MethodDescription> getMethodMatcher() {
    return named("addHandler")
        .and(takesArgument(0, named("io.javalin.http.HandlerType")))
        .and(takesArgument(1, String.class))
        .and(takesArgument(2, named("io.javalin.http.Handler")))
        .and(takesArguments(4));
}

So, this matcher takes four arguments, but I am only checking three of them. Why is that? Well, let me introduce you to the brittleness of this method matching approach: refactoring and major versions! All of those matchers of course only work if they are stable across versions, otherwise they do not match and no instrumentation happens - which still means you app will work fine, but you will not get a nice overview in your APM UI.

In the case of Javalin, there are currently preparations to release the next major version, Javalin 4. Unfortunately the addHandler method differs between Javalin 3 and Javalin 4. This is the Javalin 3 version:

public Javalin addHandler(@NotNull HandlerType handlerType,
                          @NotNull String path,
                          @NotNull Handler handler,
                          @NotNull Set<Role> roles) {

This is the Javalin 4 version

public Javalin addHandler(@NotNull HandlerType handlerType,
                          @NotNull String path, 
                          @NotNull Handler handler, 
                          @NotNull RouteRole... roles) {

As you can see, the fourth parameter is different. In order to cater for that, the matcher does not check its type, but simply for the number of arguments. If we would have checked for its type, then upgrading to Javalin 4 would have resulted in this instrumentation not working anymore. This means you have to keep up-to-date with changes upstream and build a good testing infrastructure in your APM plugin. Let’s do that now.

Unit Testing The Javalin Instrumentation

The APM agent has a nice testing infrastructure, so that the tests are surprisingly short

public class JavalinInstrumentationTest extends AbstractInstrumentationTest {

  private static final Javalin app = Javalin.create();
  private static String baseUrl;
  private final HttpClient client = HttpClient.newHttpClient();

  @BeforeAll
  public static void startJavalin() {
    app.start(0);
    baseUrl = "http://localhost:" + app.port();
  }

  @AfterAll
  public static void stopJavalin() {
    app.stop();
  }

  @Test
  public void testJavalinSimpleGet() throws Exception {
    app.get("/", ctx -> ctx.status(200));

    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create(baseUrl + "/")).build();
    final HttpResponse<String> mainUrlResponse = client.send(request, 
        HttpResponse.BodyHandlers.ofString());
    assertThat(mainUrlResponse.statusCode()).isEqualTo(200);
    assertThat(reporter.getFirstTransaction().getNameAsString())
        .isEqualTo("GET /");
    assertThat(reporter.getFirstSpan().getNameAsString()).isEqualTo("GET /");
  }

There is testing infrastructure to wait a certain time for spans, if your code executes async, to check for all spans, so writing the tests was not much of a problem.

Summary

If you want to take a look at the plugin, check it out within the apm-java-agent on GitHub. If you check out the plugins directory, you will see that there are already more than 50 plugins for different frameworks, so there is enough inspiration to take a look.

Sample Transaction with Spans

If you want to try the Javalin instrumentation out yourself, go ahead and install the java agent, at least in version 1.25. See the installation instructions here.

This also made me think a little bit more about frameworks and programming languages. Instead of instrumenting frameworks, are there possibilities to integrate this better into a programming language or a framework in general? Using JFR events or something? If you had to write a framework or programming language from scratch, would you provide hooks for this? Spring Boot seems to be one of the few, providing interfaces to SQL communication that allows to implement classes collecting data, otherwise this seems to be quite rare.

Now, if your favorite framework is not added yet, you have two options. First, there is always the possibility to manually instrument the code using the public API of the java agent - this requires to include the agent as a dependency in your code. The other way is of course, to get familiar with the agent itself and contribute an implementation for a framework, or at least file an issue, that there is interest for it. You can also always as on the discuss forum for the APM java agent.

Last but not least, this feature would not have been possible with the help of the maintainers of the Java Agent, Eyal (who helped me a ton understanding what to pay attention to), Sylvain and Felix.

Resources

Final remarks

If you made it down here, wooow! Thanks for sticking with me. You can follow or ping me on twitter, GitHub or reach me via Email (just to tell me, you read this whole thing :-).

If there is anything to correct, drop me a note, and I am happy to do so and append to this post!

Same applies for questions. If you have question, go ahead and ask!

If you want me to speak about this, drop me an email!


Back to posts