Resilience4j is used in Java projects to implement the speed limit / throttling mechanism of client API calls

In the last article in this series, we learned about Resilience4j and how to use its [Retry module]( ). Now let's learn about RateLimiter - what it is, when and how to use it, and what to pay attention to when implementing rate limits (or "throttling").

Code example

A working code example on GitHub is attached to this article.

What is Resilience4j?

Please refer to the description in the previous article to quickly understand the general working principle of [Resilience4j](

What is the speed limit?

We can look at rate constraints from two perspectives - as service providers and as service consumers.

Server speed limit

As a service provider, we implement rate limits to protect our resources from overload and denial of service (DoS) attacks.

In order to meet our service level agreements (SLAs) with all consumers, we want to ensure that a consumer who leads to a surge in traffic does not affect the quality of our service to others.

We do this by setting a limit on how many requests consumers are allowed to make in a given time unit. We reject any requests that exceed the limit with an appropriate response, such as HTTP status 429 (too many requests). This is called the server-side rate limit.

The rate limit is specified in requests per second (rps), requests per minute (rpm), or the like. Some services have multiple rate limits at different durations (e.g., 50 rpm and no more than 2500 rph) and at different times of the day (e.g., 100 rps during the day and 150 rps at night). This limit may apply to a single user (identified by user ID, IP address, API access key, etc.) or a tenant in a multi tenant application.

Client speed limit

As consumers of services, we want to ensure that we do not overload service providers. In addition, we do not want to incur unexpected costs - whether in terms of money or service quality.

This can happen if the services we consume are elastic. Service providers may not limit our requests, but charge us extra for additional load. Some even prohibit misbehaving customers for a short time. The rate limit imposed by consumers to prevent such problems is called client rate limit.

When to use RateLimiter?

resilience4j-ratelimiter Used for client rate limiting.

Server side rate limits require things such as caching and coordination between multiple server instances, which is not supported by resilience4j. For server-side rate limits, there are API gateways and API filters, such as Kong API Gateway and Repose API Filter Resilience4j's RateLimiter module is not intended to replace them.

Resilience4j RateLimiter concept

The thread that wants to call the remote service first requests permission from the RateLimiter. If the RateLimiter allows, the thread continues. Otherwise, the RateLimiter will park the thread or put it in a waiting state.

RateLimiter periodically creates new permissions. When permissions are available, threads are notified and can continue.

The number of calls allowed over a period of time is called limitForPeriod. The frequency at which RateLimiter refreshes permissions is specified by limitRefreshPeriod. timeoutDuration specifies how long a thread can wait to obtain permissions. If no permissions are available at the end of the waiting time, RateLimiter will throw a RequestNotPermitted runtime exception.

Using Resilience4j RateLimiter module

RateLimiterRegistry, RateLimiterConfig, and RateLimiter are resilience4j-ratelimiter The main abstraction of.

RateLimiterRegistry is a factory for creating and managing RateLimiter objects.

RateLimiterConfig encapsulates the limitForPeriod, limitRefreshPeriod, and timeoutDuration configurations. Each RateLimiter object is associated with a RateLimiterConfig.

RateLimiter provides helper methods to create decorators for functional interfaces or lambda expressions that contain remote calls.

Let's see how to use the various features available in the RateLimiter module. Suppose we are building a website for an airline to allow its customers to search and book flights. Our service talks to a remote service encapsulated by the FlightSearchService class.

Basic example

The first step is to create a RateLimiterConfig:

RateLimiterConfig config = RateLimiterConfig.ofDefaults();

This creates a RateLimiterConfig with default values of limitForPeriod (50), limitRefreshPeriod(500ns), and timeoutDuration (5s).

Suppose our contract with the airline service stipulates that we can call their search API with 1 rps. Then we will create RateLimiterConfig like this:

RateLimiterConfig config = RateLimiterConfig.custom()

If the thread cannot obtain permission within the specified 1 second timeoutDuration, an error occurs.

Then we create a RateLimiter and decorate the searchFlights() call:

RateLimiterRegistry registry = RateLimiterRegistry.of(config);
RateLimiter limiter = registry.rateLimiter("flightSearchService");
// FlightSearchService and SearchRequest creation omitted
Supplier<List<Flight>> flightsSupplier =
    () -> service.searchFlights(request));

Finally, we used the decorated supplier < list < flight > > many times:

for (int i=0; i<3; i++) {

The timestamp in the sample output shows one request per second:

Searching for flights; current time = 15:29:40 786
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 15:29:41 791
[Flight{flightNumber='XY 765', ... }, ... ]

If the limit is exceeded, we will receive a RequestNotPermitted exception:

Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(

 at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(

... other lines omitted ...

The decoration method throws a checked exception

Suppose we are calling
Flightsearchservice. Searchflightstthrowingexception(), which can throw a checked Exception. Then we can't use it
RateLimiter.decorateSupplier(). We will use
RateLimiter.decorateCheckedSupplier() replaces:

CheckedFunction0<List<Flight>> flights =
    () -> service.searchFlightsThrowingException(request));

try {
} catch (...) {
  // exception handling

RateLimiter.decorateCheckedSupplier() returns a CheckedFunction0, which represents a function without parameters. Note the call to apply() on the CheckedFunction0 object to call the remote operation.

If we don't want to use Suppliers, RateLimiter provides more auxiliary decorator methods, such as decorateFunction(), decorateCheckedFunction(), decoratrunnable (), decoratcallable (), etc., to be used with other language structures. The decorateChecked * method is used to decorate the method that throws the checked exception.

Apply multiple rate limits

Suppose that the airline's flight search has multiple rate limits: 2 rps and 40 rpm. We can apply multiple limits on the client by creating multiple ratelimits:

RateLimiterConfig rpsConfig = RateLimiterConfig.custom().

RateLimiterConfig rpmConfig = RateLimiterConfig.custom().

RateLimiterRegistry registry = RateLimiterRegistry.of(rpsConfig);
RateLimiter rpsLimiter =
  registry.rateLimiter("flightSearchService_rps", rpsConfig);
RateLimiter rpmLimiter =
  registry.rateLimiter("flightSearchService_rpm", rpmConfig);  
Then we use two RateLimiters decorate searchFlights() method:

Supplier<List<Flight>> rpsLimitedSupplier =
    () -> service.searchFlights(request));

Supplier<List<Flight>> flightsSupplier
  = RateLimiter.decorateSupplier(rpmLimiter, rpsLimitedSupplier);

The sample output shows 2 requests per second and is limited to 40 requests:

Searching for flights; current time = 15:13:21 246
Searching for flights; current time = 15:13:21 249
Searching for flights; current time = 15:13:22 212
Searching for flights; current time = 15:13:40 215
Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted:
RateLimiter 'flightSearchService_rpm' does not permit further calls
at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(
at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(

Change limits at run time

If necessary, we can change the values of limitForPeriod and timeoutDuration at runtime:


For example, this feature is useful if our rate limit changes according to the time of day - we can have a scheduled thread to change these values. The new value will not affect the thread currently waiting for permission.

RateLimiter is used with Retry

Suppose we want to Retry when we receive a RequestNotPermitted exception because it is a transient error. We will create RateLimiter and Retry objects as usual. Then we decorate a Supplier and wrap it with Retry:

Supplier<List<Flight>> rateLimitedFlightsSupplier =
    () -> service.searchFlights(request));

Supplier<List<Flight>> retryingFlightsSupplier =
  Retry.decorateSupplier(retry, rateLimitedFlightsSupplier);

The sample output is displayed as RequestNotPermitted exception retry request:

Searching for flights; current time = 15:29:39 847
Flight search successful
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 17:10:09 218
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]
2020-07-27T17:10:09.484: Retry 'rateLimitedFlightSearch', waiting PT1S until attempt '1'. Last attempt failed with exception 'io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls'.
Searching for flights; current time = 17:10:10 492
2020-07-27T17:10:10.494: Retry 'rateLimitedFlightSearch' recorded a successful retry attempt...
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]

The order in which we create decorators is important. If we wrap Retry with RateLimiter, it won't work.

Ratelimit event

RateLimiter has an EventPublisher that generates events of types RateLimiterOnSuccessEvent and RateLimiterOnFailureEvent when calling a remote operation to indicate whether permission acquisition is successful. We can listen to these events and record them, for example:

RateLimiter limiter = registry.rateLimiter("flightSearchService");
limiter.getEventPublisher().onSuccess(e -> System.out.println(e.toString()));
limiter.getEventPublisher().onFailure(e -> System.out.println(e.toString()));

An example of log output is as follows:

RateLimiterEvent{type=SUCCESSFUL_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.127+05:30}
... other lines omitted ...
RateLimiterEvent{type=FAILED_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.186+05:30}

Ratelimit indicator

Suppose that after implementing client throttling, we find that the response time of the API increases. This is possible - as we can see, if permissions are not available when a thread calls a remote operation, RateLimiter puts the thread in a waiting state.

If our request processing thread often waits for permission, it may mean that our limitForPeriod is too low. Maybe we need to work with our service providers and get additional quotas first.

Monitoring the RateLimiter metric can help us identify such capacity problems and ensure that the values we set on RateLimiterConfig work well.

RateLimiter tracks two metrics: the number of available permissions(
resilience4j.ratelimiter.available.permissions) and the number of threads waiting for permissions(

First, we create RateLimiterConfig, RateLimiterRegistry, and RateLimiter as usual. Then, we create a MeterRegistry and bind RateLimiterRegistry to it:

MeterRegistry meterRegistry = new SimpleMeterRegistry();

After several speed limiting operations, we display the captured indicators:

Consumer<Meter> meterConsumer = meter -> {
  String desc = meter.getId().getDescription();
  String metricName = meter.getId().getName();
  Double metricValue =, false)
    .filter(m -> m.getStatistic().name().equals("VALUE"))
    .map(m -> m.getValue())
  System.out.println(desc + " - " + metricName + ": " + metricValue);};meterRegistry.forEachMeter(meterConsumer);

Here are some sample outputs:

The number of available permissions - resilience4j.ratelimiter.available.permissions: -6.0
The number of waiting threads - resilience4j.ratelimiter.waiting_threads: 7.0

The negative value of resilience4j.ratelimiter.available.permissions shows the number of permissions reserved for the requesting thread. In practical application, we will regularly export the data to the monitoring system and analyze it on the dashboard.

Pitfalls and good practices in implementing client rate limits

Make the rate limiter a single example

All calls to a given remote service should pass through the same RateLimiter instance. For a given remote service, the RateLimiter must be a singleton.

If we do not enforce this operation, some areas of our code base may directly call the remote service by bypassing ratelimit. To prevent this, the actual call to the remote service should be in the core, internal layer and other areas. The speed limiter exposed by the internal layer should be used.

How can we ensure that future new developers understand this intent? Check out Tom's article, which reveals a way to solve this problem, that is, to clarify this intent by organizing the package structure. In addition, it also shows how to enforce this operation by coding intent in ArchUnit testing.

Configuring rate limiters for multiple server instances

Finding the right value for the configuration can be tricky. If we run multiple service instances in a cluster, the value of limitForPeriod must take this into account.

For example, if the upstream service is limited to 100 rps and our service has 4 instances, we will configure 25 rps as the limit for each instance.

However, this assumes that the load on each instance is roughly the same. If this is not the case, or if our service itself is elastic and the number of instances may be different, then Resilience4j's ratelimit May not be appropriate.

In this case, we need a rate limiter to store its data in the distributed cache instead of in memory like Resilience4j RateLimiter. However, this will affect the response time of our service. Another option is to implement some adaptive rate limit. Although Resilience4j May support It is, but it is not clear when it will be available.

Select the correct timeout

For the timeoutDuration configuration value, we should keep in mind the expected response time of the API.

If we set timeoutDuration too high, the response time and throughput will be affected. If it is too low, our error rate may increase.

Since some trial and error may be involved here, it is a good practice to maintain the values we use in RateLimiterConfig (such as timeoutDuration, limitForPeriod and limitRefreshPeriod) as configurations outside our service. Then we can change them without changing the code.

Tuning client-side and server-side rate limiters

Implementing client rate limits does not guarantee that we will never be limited by the rate of upstream services.

Suppose we have a limit of 2 rps from the upstream service, and we configure limitForPeriod to 2 and limitRefreshPeriod to 1s. If we make two requests in the last few milliseconds of the second second second and there are no other calls before, RateLimiter will allow them. If we make two more calls in the first few milliseconds of the next second second, RateLimiter will also They are allowed because two new permissions are available. However, the upstream service may reject these two requests because the server usually implements sliding window based rate limits.

In order to ensure that we will never exceed the rate from the upstream service, we need to configure the fixed window in the client to be shorter than the sliding window in the service. Therefore, if we configure limitForPeriod to 1 and limitRefreshPeriod to 500ms in the previous example, we will not have an error exceeding the rate limit. However, the first request All three subsequent requests wait, increasing response time and reducing throughput.


In this article, we learned how to use Resilience4j's RateLimiter module to implement client-side rate limiting. We studied different methods of configuring it through practical examples. We learned some good practices and precautions to remember when implementing rate limiting.

You can use [on GitHub]( )The code demonstrates a complete application to illustrate these ideas.

This article is translated from: [implementing rate limiting with resilience4j - reflecting](

Tags: Java

Posted on Tue, 23 Nov 2021 14:44:14 -0500 by stc7outlaw