Eureka highly available Client retry mechanism: RetryableEurekaHttpClient

Here are a few questions I asked myself when I read the source code. First, let's bring them up. I hope that those who read this article will read them with questions. Then, I will give a preliminary introduction to EurekaHttpClient system. Later, I will talk about RetryableEurekaHttpClient in detail

1. How does Eureka Client register with Eureka Server cluster? If my Client's ServiceUrl is configured with multiple Eureka Service addresses, will the Client initiate registration with each Server?

2. Eureka Server has replication behavior, that is, to copy its own instance information to other Eureka Server nodes. Since there is replication behavior, it's OK to configure only one of Eureka Client's serviceurls. If I register with the Server, it's not good for the Server to copy my information to other Eureka Server nodes. Does it mean Eureka Just configure one of the client's serviceurls?

3. If more than one Eureka Client's ServiceUrl is configured, which Eureka Server will the Client keep communicating with (registration, renewal heartbeat, etc.)? Is it the first one, or is it random?

defaultZone: http://server3:50220/eureka,http://server1:50100/eureka,http://server2:50300/eureka

RetryableEurekaHttpClient inherits from EurekaHttpClient decorator EurekaHttpClientDecorator. It is not the HttpClient that actually initiates http request. It will indirectly delegate the request to abstractjersey EurekaHttpClient, as shown in the following class diagram:

EurekaHttpClientDecorator adopts the template method mode. In the behaviors of register, cancel, sendHeartBeat, etc., the execute method is abstracted to let the subclass customize the execution behavior

//Anonymous interface, no specific implementation class
public interface RequestExecutor<R> {
    EurekaHttpResponse<R> execute(EurekaHttpClient delegate);
    RequestType getRequestType();
//The template design pattern is used to abstract the execute behavior from the behaviors of register, cancel, sendHeartBeat, etc., and let the subclass define the concrete implementation by itself
protected abstract <R> EurekaHttpResponse<R> execute(RequestExecutor<R> requestExecutor);

Only register methods are listed below

public EurekaHttpResponse<Void> register(final InstanceInfo info) {
    //Implementation of anonymous interface, calling execute method of subclass
    return execute(new RequestExecutor<Void>() {
        public EurekaHttpResponse<Void> execute(EurekaHttpClient delegate) {
            //Step by step, and finally to abstractjersey eurekahttpclient
            return delegate.register(info);
        public RequestType getRequestType() {
            return RequestType.Register;

This article mainly introduces RetryableEurekaHttpClient, which is quite important. I will write several others later when I have time.

As the name implies, there must be a retry mechanism in this class. Let's first look at its execute (requestexecutor < R > requestexecutor) method. We see that there is a for loop in this method, and http requests are initiated in the loop. The default retry number of this loop is 3 (which can be configured without configuration).

In this loop, there is a getHostCandidates() method to get all available Eureka Server endpoints, and then traverse Eureka Server endpoint through endpointIdx + +, and send http requests. If there are exceptions such as timeout during the request process, note that no exception is thrown in the catch code block, but the log is recorded, and then the timeout Eureka is recorded The server endpoint is added to the blacklist quarantineSet to continue the for loop.

Exception handling is an important part of retry mechanism. If there is no try catch in this area or an exception is thrown directly, for example, there are three serviceurls. When a request is made to server3 at a certain time, there will be an exception. Even if the latter two server1 and server2 are available, the request will not be made (throw an exception, the following for loop code will not be executed).

Since numberOfRetries is equal to 3, that is to say, you can retry three times at most. If it fails, even if the fourth serviceUrl is available, you will not try.

protected <R> EurekaHttpResponse<R> execute(RequestExecutor<R> requestExecutor) {
    List<EurekaEndpoint> candidateHosts = null;
    //Subscript of candidate Eureka ServerList
    int endpointIdx = 0;
    //Default retry 3 times, DEFAULT_NUMBER_OF_RETRIES = 3
    for (int retry = 0; retry < numberOfRetries; retry++) {
        EurekaHttpClient currentHttpClient = delegate.get();
        EurekaEndpoint currentEndpoint = null;
        if (currentHttpClient == null) {
            if (candidateHosts == null) {
                //Get serviceUrlList of candidate Eureka Server
                candidateHosts = getHostCandidates();
                if (candidateHosts.isEmpty()) {
                    //If this exception occurs, the basic mountain can be sure that serviceUrl and remoteRegion are not configured
                    throw new TransportException("There is no known eureka server; cluster server list is empty");
            if (endpointIdx >= candidateHosts.size()) {
                // This exception is also very common. The loop in this method is executed three times by default. When you have only one ServiceUrl,
                // And it is invalid, so this exception will be thrown on the second retry
                throw new TransportException("Cannot execute request on any known server");
            //Get serviceUrl information
            currentEndpoint = candidateHosts.get(endpointIdx++);
            //Building a new httpClient based on the new serviceUrl information
            currentHttpClient = clientFactory.newClient(currentEndpoint);
        try {
            //Request to serviceUrl, register, heartBeat, Cancel, statusUpdate.
            EurekaHttpResponse<R> response = requestExecutor.execute(currentHttpClient);
            // serverStatusEvaluator is the status evaluator, which is each request type (Register, SendHeartBeat, Cancel, GetDelta, etc.)
            // Set the acceptable status code, for example, when the request type is Register, and response.getStatusCode() is 404, so it is also acceptable at this time
            // Stop trying the next service URL
            if (serverStatusEvaluator.accept(response.getStatusCode(), requestExecutor.getRequestType())) {
                if (retry > 0) {
          "Request execution succeeded on retry #{}", retry);
                return response;
            logger.warn("Request execution failure with status code {}; retrying on another server if available", response.getStatusCode());
        } catch (Exception e) {
            //If the connection timeout and other exceptions occur during the request, print the log, update the currentHttpClient, replace the next serviceUrl and try again
            logger.warn("Request execution failed with message: {}", e.getMessage());  // just log message as the underlying client should log the stacktrace
        // Connection error or 5xx from the server that must be retried on another server
        delegate.compareAndSet(currentHttpClient, null);
        if (currentEndpoint != null) {
            //http request failed. Blacklist the Eureka Server endpoint currently being attempted.
    //If the request fails three times, the request will be abandoned. If four Eureka addresses are configured in the serviceUrl and the first three failed, even if the fourth serviceUrl is available, the request will not be tried
    throw new TransportException("Retry limit reached; giving up on completing the request");

From the above code, we can draw a conclusion:

1. When the Eureka Client sends the registration, heartbeat and other requests, it will try one by one to the Eureka Server cluster node serviceUrlList. If one of the requests succeeds, it will directly return the response and no longer request to other nodes. It can only retry three times at most and throw an exception directly more than three times.

2. If you configure defaultZone as follows, the order of requests is Server3 - > Server1 - > server2

3. defaultZone recommends that you configure multiple URLs, even if they are greater than 3, because some server s may be blacked by the client, will not be requested by the client, and will not be counted in the number of retries

4. If server3 is always available in the following configuration, the Client will only send heartbeat and other events to this server

5, The defaultZone of the Eureka Client should not be configured in the same order. It is better to disrupt the configuration. If all the Eureka Client are configured as follows, the pressure of this server3 is very high. It is not only responsible for receiving the heartbeat state changes of all the client, but also responsible for synchronizing information with other server cluster nodes

defaultZone: http://server3:50220/eureka,http://server1:50100/eureka,http://server2:50300/eureka

getHostCandidates() is used to get the ServiceUrlList. There is a blacklist mechanism inside it. If the request to an Eureka Server endpoint fails abnormally, the Eureka will be sent The server endpoint is put into the quarantineset (isolation set). The next time the getHostCandidates() method is called (in the above for loop, this method will only execute once), the quarantineSet.size() compared with a threshold, if less than this threshold, the candidate hosts will be filtered.

The basic purpose of this blacklist mechanism is to increase the probability of Eureka Client's successful request. Imagine that if the above server3 hangs forever, and there is no good way to dynamically change the configuration of the client's defaultZone, the client will request server3 every 30 seconds when sending a heartbeat

private List<EurekaEndpoint> getHostCandidates() {
    //Get all Eureka Server cluster nodes
    List<EurekaEndpoint> candidateHosts = clusterResolver.getClusterEndpoints();
    //Take the intersection of blacklist, and see how many Server endpoints in candidate hosts are in the blacklist
    // If enough hosts are bad, we have no choice but start over again
    //The default percentage is 0.66, which is about 2 / 3,
    //For example, if candidate hosts = 3, then threshold is equal to 1 (3 * 0.66 = 1.98, which is equal to 1 in int strong transformation...)
    int threshold = (int) (candidateHosts.size() * transportConfig.getRetryableClientQuarantineRefreshPercentage());
    //Prevent threshold is too large
    if (threshold > candidateHosts.size()) {
        //To prevent the threshold from being too large, this percentage may be set to a value greater than 1 by mistake
        threshold = candidateHosts.size();
    if (quarantineSet.isEmpty()) {
        //Blacklist is empty, no filtering
        // no-op
    } else if (quarantineSet.size() >= threshold) {
        //The number of blacklists is greater than this threshold. Clear the blacklist and do not filter
        //The purpose of setting the threshold is to prevent all the serverlist s from being unavailable and being blacked out
        //So clear the blacklist and try again
        logger.debug("Clearing quarantined list of size {}", quarantineSet.size());
    } else {
        //If it is less than the threshold, the endpoint in the blacklist will be filtered out
        List<EurekaEndpoint> remainingHosts = new ArrayList<>(candidateHosts.size());
        for (EurekaEndpoint endpoint : candidateHosts) {
            if (!quarantineSet.contains(endpoint)) {
        candidateHosts = remainingHosts;
    return candidateHosts;

The retryable Eureka httpclient's retrying mechanism is almost the same. If there are students who want to debug themselves, you can configure the defaultZone on the Client side as follows, and then open only one Eureka Server (server2) then set a breakpoint in the execute method of RetryableEurekaHttpClient class, and debug to start the Client (register with Eureka and fetch registry attributes must have one configured as true)

defaultZone: http://server3:50220/eureka,http://server1:50100/eureka,http://server2:50300/eureka


Tags: less

Posted on Fri, 26 Jun 2020 04:47:11 -0400 by abigbluewhale