Microservice Best Practices: Scale your java microservices using virtual threads & async programming

Overview

This article is part of the Microservices Best Practices series. The goal of this Tech Community article series is to address the need for Best Practices for Microservices Development.

One major pattern of a Cumulocity IoT microservice is a server-side agent (check out the Getting Started guide) which should handle thousands of devices at the same time. Very likely the microservice is not communicating with each device every time but offers kind of a client or endpoints that retrieves updates in an interval pushed by the device. To roughly describe the logic of such a microservice:

Process incoming message → transform it into the c8y domain model → send it to a Cumulocity tenant the device is registered to.

If you implement such logic in just a single blocking thread it will be not available for other devices during that time, thus your whole microservice only scales with your maximum number of available threads. Luckily there is a solution for that which is called asynchronous programming which tries to parallel logic as much as possible.

The main goal of such a microservice is to achieve a maximum throughput of concurrent tasks so it can handle thousands of devices in parallel just with a single instance.

But first, let’s find out how threading is working in java applications.

About Cores & Threads

As you might know, your CPU has a limited number of cores. Modern CPUs have multiple cores, in a desktop environment most likely around 8 Cores and 16 Threads. The main difference between cores and threads is that cores are physical units while threads are virtual constructs to allow physical cores to process more than 1 process in parallel. In our example 2 processes per physical core.

The operating system (OS) manages the available threads and assigns tasks to them. That the reason they are also called OS threads. In our example we can have max. 16 concurrent processes running on our CPU.

The Java Virtual Machine (JVM) uses the available OS Threads in a 1:1 relation and makes them available to applications as so called platform threads. As a programmer, you can implement code to be processed in just one thread (single-threaded). After the code execution is completed the jvm releases the thread and some other code can be executed. In the worst case all available threads are currently busy and new processes have to wait until one thread gets released so they can be executed.

As a developer, you can optimize this by writing non-blocking code that is executed in parallel by multiple threads and not blocking one dedicated thread to execute your code sequentially. This is called asynchronous programming.

Async Programming with Java

Asynchronous Programming has been mainly introduced in Java 8 back in 2014. Even before that in Java 5 a basic future interface in combination with an Executor Service was introduced which allows first asynchronous code by “submitting” tasks that return a “future”.

Executor Service

The Executor Service is a service that lets you run tasks asynchronously. It is very limited in regard to flexibility and chaining tasks.

ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(() -> {
    System.out.println("Do something async!");
});

Callable<String> callableTask = () -> {
    return "Result of async Task";
};
Future<String> exFuture = executor.submit(callableTask);
System.out.println(exFuture.get());

You can define runnable and callable tasks that can be submitted to the executor. A basic future will be returned which allows to block of the current thread with .get() to retrieve the result for further processing.

Completable futures

With Java 8 completable futures were introduced which give much more flexibility on how to execute code asynchronously. For example, it contains methods like supplyAsync , runAsync , and thenApplyAsync to easily execute any code block asynchronously. All asynchronous code is executed in a ForkJoinPool.commonPool() if you don’t provide an explicit executor.

CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
    System.out.println("Do something async!");
});

In the example above we just use runAsync to execute some logic asynchronously. We can also get a return value if necessary by using supplyAsync:

CompletableFuture<String> futureString = CompletableFuture.supplyAsync(() -> {
    return "Result of async processing";
});
futureString.join();

Here we don’t get the string back but a completable future of type string. With get we can force the current thread to block and wait until the async processing is completed, but there is a better approach. We can chain operations:

futureString.thenApply(result -> {
   return result.toUpperCase();
}).thenAccept(upperCaseString -> {
    System.out.println(upperCaseString);
});

After the initial future is completed, another operation will make the result upper case, while just another one will print it out. This is one approach you can do that. You can also use the methods complete() and completeExceptionally() to define in your code block when the future should be completed.

CompletableFuture<String> future1 = new CompletableFuture();
CompletableFuture.runAsync(() -> {
    future1.complete("Operation is finished!");
});

Especially the chaining of completable futures is very powerful. It allows you to write your code fully asynchronously with the drawback if making it more complex and less readable. Also, they are quite powerful enough to implement efficient code that is scaled across all available OS threads managed by the ForkJoinPool.

As you’ve seen completable futures require you to change your code-style from synchronous to handler-based code. As already explained this could lead to less readable and more complex code.

But with Java 21 there is a new kid in the asynchronous block: Virtual Threads

Virtual Threads in Java 21+

In Java 21 virtual threads have been introduced. While executor service and completable futures leverage and optimize the use of available OS threads, virtual threads abstract them one level further. Virtual threads are not tied to one specific OS thread but use so called carrier threads which can execute multiple virtual threads in parallel on one OS thread. The difference is that once a virtual thread is completed the JVM will suspend (or reuse) it and execute another one using the same carrier thread and assigned OS thread. Also, blocked virtual threads are unassigned from the carrier threads so others can be processed. With that you can have theoretically thousands, even millions of (blocking) virtual threads running on a limited number of OS threads.

Also, virtual threads consume much less resources and can be much quicker created than platform threads. As a result, you can scale your applications much better having even more asynchronous logic executed on the same number of available OS threads.

Some numbers:

  • A platform thread takes about 1s to be created, a virtual thread can be created in less than 1 µs
  • A platform thread reserves ~1 MB of memory for the stack, a virtual thread starts with ~1 KB

So are they in general faster than platform threads? It depends. Virtual threads are ideal for executing code blocks including blocking operations but not holding any thread-local variables. A virtual thread is not as fast as a platform thread if it is just using the CPU to calculate stuff without blocking much. Also if you already implemented your code mainly async you will also not benefit much from virtual threads.

There are also some risks when using virtual threads. Because virtual threads are somehow unlimited and available, you risk running into the memory limits of the JVM. You can avoid that by using immutable objects where possible which can be shared across threads. Also, thread-local variables shouldn’t be extensively used in virtual threads

If you have that in mind you can build very efficient and high-throughput applications with it without changing your coding style. It is even desired to implement your code in a synchronous blocking style as virtual threads scale with their numbers and don’t block any OS threads anymore.

Please note: Combining them with completable future async chains is less efficient as they would not benefit much from it and they might use thread-local variables in the background consuming a lot of memory.

All of this makes them a perfect fit for applications that have a lot of concurrency and want to achieve a high throughput which is normally the case for server-side-agents that need to handle thousands of concurrent requests.

Virtual Threads are supported by Spring Boot 3.2. If we enable them via

spring.threads.virtual.enabled=true

the embedded Tomcat and Jetty will automatically handle virtual threads. Also the Spring MVC profits by this setting e.g. the @Async annotation.

Unfortunately, the Microservice SDK currently uses Spring Boot 2.7.x. Still, we can use virtual threads natively by just increasing the java version to 21.

Using virtual threads

You can easily create a new virtual thread:

Thread.startVirtualThread(() -> {
    System.out.println("Do something async using a virtual Thread!");
});

Or you use the Thread.Builder and define a task.

Please note: Virtual threads don’t have any names per default which makes it kind of hard to monitor. On creation, you can assign a thread name with .name("threadName")

Thread.Builder builder = Thread.ofVirtual().name("virtThread");
Runnable task = () -> {
    System.out.println("Do something async using a virtual Thread!");
};
Thread t = builder.start(task);
t.join();

If you used the Executor Service before you can just change it to use virtual threads very easily:

//ExecutorService using Virtual Threads
ExecutorService virtExecutorService = Executors.newVirtualThreadPerTaskExecutor();

It follows the same concept to assign tasks etc.

Let’s now check how completable futures and virtual threads can be used for the Microservice SDK.

Use in Microservice SDK

Please note: The examples below are based on Microservice SDK 1020.107.0 using Spring Boot 2.7.17 and Java 21. They might look differently when using Spring Boot 3.2 as it supports virtual threads and enables them for Spring MVC out of the box. When the Microservice SDK is updated I will also update this article.

In our examples, we use the Microservice SDK to retrieve events from the platform and return them as a response to a REST request. In addition, I added Thread.sleep to simulate blocking of the service like the following

Simulate Blocking Method

public void simulateBlocking() {
    try {
        log.info("Simulating blocking for 2s...");
        Thread.sleep(2000);
        log.info("Blocking completed!");
    } catch (InterruptedException e) {
        throw new RuntimeException(e);
    }
}

Let’s start with the most usual but inefficient implementation

1. Blocking REST Controller calling blocking service method:

RestController:

@GetMapping(path = "/asyncEvents0", produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<List<EventRepresentation>> getEvents0() {
    log.info("Calling getEvents0!");
    return new ResponseEntity<>(eventService.getEvents0(), HttpStatus.OK) ;
}

EventService:

public List<EventRepresentation> getEvents0() {
    simulateBlocking();
    List<EventRepresentation> eventList = eventApi.getEvents().get(10).getEvents();
    log.info("# Events found: {}", eventList.size());
    log.info("Returning result!");
    return eventList;
}

Very simple and clean code but also very inefficient. As you can see each REST request will block a platform thread until simulateBlocking and the request to Cumulocity is sequentially completed. This way our microservice scales only until the number of concurrent requests hits the available thread limit and the request takes very long to return a response.

Unfortunately this blocking code I see the most in microservice implementations I have reviewed.

This is improvable of course. Let’s check the next example:

2. Blocking REST Controller calling async service using completable future

RestController:

@GetMapping(path = "/asyncEvents1", produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<List<EventRepresentation>> getEvents1() {
    log.info("Calling getEvents1!");
    return new ResponseEntity<>(eventService.getEvents1().join(), HttpStatus.OK) ;
}

EventService:

public CompletableFuture<List<EventRepresentation>> getEvents1() {
    var eventFuture = new CompletableFuture();

    CompletableFuture.runAsync(() -> {
        simulateBlocking();
    });
    CompletableFuture.runAsync(() -> {
        subscriptionsService.runForEachTenant(() -> {
            try {
                List<EventRepresentation> eventList = eventApi.getEvents().get(10).getEvents();
                eventFuture.complete(eventApi.getEvents().get(10).getEvents());
                log.info("# Events found: {}", eventList.size());
            } catch (Exception e) {
                e.printStackTrace();
                CompletableFuture.failedFuture(e);
            }
        });
    });
    log.info("Returning result!");
    return eventFuture;
}

The RESTController is still very clean but the service method now is much more complex. Firstly we need to define a CompletableFuture we want to return. We create a new thread with CompletableFuture.runAsync(...) for the simulated blocking code block and for the call to Cumulocity. As we lose the context of the RESTController in that new thread we need to get the context again using subscriptionService. Now having the context we call the eventAPI and complete the eventFuture.
As exception handling we define that the future should be complete exceptionally by failedFuture.

From an efficient and concurrency perspective it’s more efficient now. For each request received by the REST Controller, we create two new asynchronous threads to process the blocking code & call Cumulocity in parallel. This helps to scale out our microservice to efficiently handle many more concurrent requests to Cumulocity API and parallelize the event API calls. Still, the REST Controller blocks the thread by calling join on the getEvents1 method. So it has to wait until the async operation is completed.

This brings us to the next example.

3. Non-Blocking REST Controller calling async service using @Async annotation

RestController:

@GetMapping(path = "/asyncEvents2", produces = MediaType.APPLICATION_JSON_VALUE)
public DeferredResult<ResponseEntity<List<EventRepresentation>>> getEvents2() {
    log.info("Calling getEvents2!");
    DeferredResult<ResponseEntity<List<EventRepresentation>>> result = new DeferredResult<>();
    eventService.getEvents2().thenApply(events -> result.setResult(new ResponseEntity<>(events, HttpStatus.OK)));
    return result;
}

EventService:

@Async
public CompletableFuture<List<EventRepresentation>> getEvents2() {
    var eventFuture = new CompletableFuture();
    //@Async in @Async does not work we have to call runAsync here.
    CompletableFuture.runAsync(() -> {
        simulateBlocking();
    });
    subscriptionsService.runForEachTenant(() -> {
        try {
            List<EventRepresentation> eventList = eventApi.getEvents().get(10).getEvents();
            log.info("# Events found: {}", eventList.size());
            eventFuture.complete(eventList);
        } catch (Exception e) {
            eventFuture.completeExceptionally(e);
        }
    });
    log.info("Returning result!");
    return eventFuture;
}

The REST Controller looks slightly different as we use DeferredResult as a return type which is very similar to futures. This way the REST Controller will unblock the http-worker thread immediately allowing to accept other requests in parallel. What we also use in the REST Controller is to chain async requests by using thenApply(). With that, there is no blocking code anymore in our controller. By calling the async eventService we retrieve back the future immediately. With thenApply we define that if the future is completed we set the result of the DeferredResult asynchronously, meaning, the DeferredResult is returned immediately and we will return the result when our async thread is completed.

Also, we use the @Async annotation by Spring so we don’t have to care about creating a new thread manually. Unfortunately, we need to call runAsync in the method to run the blocking code in another thread as calling an @Async in an annotated @Async method is not allowed.

With this, the REST Controller scales now even better by releasing http-worker threads and delivering the result asynchronously. Still, we run the blocking code and call to Cumulocity API asynchronously within a new thread each. Now we potentially can receive more concurrent requests with an async controller and can send more concurrent requests to Cumulocity.

Let’s evaluate another option you have.

4. Non-Blocking REST Controller calling async service using Executor Service

RESTController:

ExecutorService executorService = Executors.newCachedThreadPool();

@GetMapping(path = "/asyncEvents3", produces = MediaType.APPLICATION_JSON_VALUE)
public DeferredResult<ResponseEntity<List<EventRepresentation>>> getEvents3() {
    log.info("Calling getEvents3!");
    DeferredResult<ResponseEntity<List<EventRepresentation>>> result = new DeferredResult<>();
        executorService.submit(() -> {
            try {
                result.setResult(new ResponseEntity<>(eventService.getEvents3().get(), HttpStatus.OK));
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            } catch (ExecutionException e) {
                throw new RuntimeException(e);
            }
        });
    return result;
}

EventService:

public Future<List<EventRepresentation>> getEvents3() throws ExecutionException, InterruptedException {
    executorService.submit(() -> {
       simulateBlocking();
    });
    log.info("Returning result!");
    return executorService.submit(new EventRetrievalTask<>());
}

public class EventRetrievalTask<T> implements Callable<List<EventRepresentation>> {

    @Override
    public List<EventRepresentation> call() throws Exception {
        List<EventRepresentation> eventList = new ArrayList<>();
        subscriptionsService.runForEachTenant(() -> {
            eventList.addAll(eventApi.getEvents().get(10).getEvents());
            log.info("# Events found: {}", eventList.size());

        });
        return eventList;
    }
}

We are already familiar with DeferredResult to implement a non-blocking REST Controller. The only difference is now that we use the Executor Service to create 2 new threads and asynchronously executing the blocking code and calling the EventService. We have to call get() in the REST Controller which unfortunately blocks one thread. This is because the EventService is also using Executor Service to asynchronously calling the event API of Cumulocity and returns only a Future which does not allow any chaining.

In summary, it does its job but I would consider it as less optimal due to blocking one thread until the another is asynchronously delivering the result.

5. Non-Blocking REST Controller calling async service using virtual threads

RESTController:

ExecutorService virtExecutorService = Executors.newVirtualThreadPerTaskExecutor();

@GetMapping(path = "/asyncEvents4", produces = MediaType.APPLICATION_JSON_VALUE)
public DeferredResult<ResponseEntity<List<EventRepresentation>>>getEvents4() throws InterruptedException, ExecutionException {
    log.info("Calling getEvents4!");
    DeferredResult<ResponseEntity<List<EventRepresentation>>> result = new DeferredResult<>();
    virtExecutorService.submit(() -> {
        try {
            result.setResult(new ResponseEntity<>(eventService.getEvents4().get(), HttpStatus.OK));
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        } catch (ExecutionException e) {
            throw new RuntimeException(e);
        }
    });
    return result;
}

EventService:

public Future<List<EventRepresentation>> getEvents4() throws InterruptedException {
    virtExecutorService.submit(() -> {
        simulateBlocking();
    });
    log.info("Returning result!");
    return virtExecutorService.submit(new EventRetrievalTask<>());
}

You might recognize that this is almost the same code as in 4 with one important change: We are using an Executor Service with virtual threads now. This is actually a big difference as blocking a virtual thread doesn’t hurt at all in comparison to blocking a platform thread. Why is that? Because virtual threads are not 1:1 bound to OS threads we can create as many as we want without the risk of blocking our limited OS threads.

The REST Controller is still fully async creating for each request a virtual thread which scales much more than creating platform threads in the previous example. Also, the Event Service scales as we directly create virtual threads for executing the blocking code block and call the Cumulocity API.

The code looks pretty synchronous and clean. The biggest benefit is that we can just with a tiny code change of switching the executor to use virtual threads scale our microservice much better. So no massive code changes are needed as it would be when switching from blocking synchronous code style to async completable future style.

Summary & Outlook

In this article I described multiple ways how you can scale your microservice using asynchronous programming. Virtual threads are an huge improvement of scaling server-side agent without changing the programming style. If you are already familiar with async programming you might also consider using completable future as they are very powerful and more flexible. Both should do the job of enabling a high throughput.

For me personally the real power behind virtual threads is that you don’t have to change your coding style to scale. This should hopefully increase the adoption of it.

All code examples can be found here:

For further reads check out the following articles:

Also please check out the Microservice Best Practices series:

What are you preferring? Looking forward to reading your feedback and comments!

4 Likes