Microservice architecture foundation

Microservice Architecture Foundation (III)

Continue to expand the previous foundation II project

Service degradation, service fusing and service current limiting

Basic concept understanding

Service avalanche
When calling between multiple services, suppose microservice A calls microservice B and microservice C, and microservice B and microservice C call other microservices, which is the so-called "fan out". If the call response time of A microservice on the fan out link is too long or unavailable, the call to microservice A will occupy more and more system resources, resulting in system crash, which is the so-called "avalanche effect".
For high traffic applications, a single back-end dependency may cause all resources on all servers to saturate in a few seconds. Worse than failure, these applications may also lead to increased latency between services, tight backup queues, threads and other system resources, resulting in more cascading failures of the whole system. These all mean that faults and delays need to be isolated and managed so that the failure of a single dependency cannot cancel the operation of the whole application or system.
Therefore, it is usually found that after running an instance under a module fails, the module will still receive traffic, and then the problematic module calls other modules, which will lead to cascade failure, or avalanche.

  • service degradation
    When the server pressure increases sharply, some services and pages are not processed strategically or processed in a simple way according to the actual business situation and traffic (such as returning a predictable and processable alternative response to the caller...), so as to release the server resources to ensure the normal or efficient operation of the core transaction.
    Usage scenario:
    When the overall load of the whole microservice architecture exceeds the preset upper threshold or the upcoming traffic is expected to exceed the preset threshold, in order to ensure the normal operation of important or basic services, we can delay or suspend the use of some unimportant or non urgent services or tasks.
  • Service fuse
    The concept of fusing comes from the circuit breaker in electronic engineering. In the Internet system, when the downstream service responds slowly or fails due to excessive access pressure, the upstream service can temporarily cut off the call to the downstream service in order to protect the overall availability of the system. This measure of sacrificing the part and preserving the whole is called fusing. However, after a service fuse occurs, the service degradation method is generally called to deal with it, and finally the calling link is slowly restored.
  • Service current limiting
    Flow restriction is to limit the concurrency. Only a specified number of requests enter the background server in a certain period of time. In case of peak traffic or sudden increase of traffic, the traffic rate shall be limited to a reasonable range acceptable to the system, so as not to make the system collapse by high traffic.
  • Hystrix

What is Hystrix?
Hystrix is an open source library for dealing with delay and fault tolerance of distributed systems. In distributed systems, many dependencies inevitably fail to call, such as timeout and exception. Hystrix can ensure that when a dependency fails, it will not lead to overall service failure, avoid cascading failures, and improve the elasticity of distributed systems.
The so-called "circuit breaker" itself is a switching device. When a service unit fails, it returns an expected and treatable alternative response to the caller through the fault monitoring of the circuit breaker (similar to a blown fuse), rather than waiting for a long time or throwing an exception that cannot be handled by the calling method, This ensures that the thread of the service caller will not be occupied unnecessarily for a long time, so as to avoid the spread and even avalanche of faults in the distributed system.

What can Hystrix do?

  • service degradation
  • Service fuse
  • Near real-time monitoring
  • Current limiting and isolation
  • ...

In order to facilitate learning and testing, we need to build a basic testing sub module as the basic framework

Sub module (micro-service-8003)

Hystrix dependency introduction:

<!-- Hystrix rely on -->
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

application.yml:

server:
  port: 8003
spring:
  application:
    name: micro-service-8003
eureka:
  client:
    register-with-eureka: true # register
    fetch-registry: true  # Get the registration information of Eureka server
    service-url:
      # For the service port address of the cluster version, you can directly fill in all the server addresses
      defaultZone: http://eureka7001.cn:7001/eureka,http://eureka7002.cn:7002/eureka

Main startup class:

package cn.wu;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.EnableEurekaClient;

@SpringBootApplication
@EnableEurekaClient
public class Application8003 {
    public static void main(String[] args) {
        SpringApplication.run(Application8003.class,args);
    }
}

Business layer:

package cn.wu.service;

import org.springframework.stereotype.Service;

@Service("testServiceBean")
public class TestService {

    public String testNormal(){
        return "The current thread name is:"+Thread.currentThread().getName()
                +",Simulate normal business processes ";
    }

    public String testTimeOut(){
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "Current thread Name:"+Thread.currentThread().getName()
                +",Simulate complex business processes ";
    }
}

Control layer:

package cn.wu.cotroller;

import cn.wu.service.TestService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class TestController {
    private TestService testService;
    @Autowired
    @Qualifier("testServiceBean")
    public void setTestService(TestService testService) {
        this.testService = testService;
    }

    // Simulate normal business processes
    @GetMapping("/test/normal")
    public String testNormal(){
        return testService.testNormal();
    }

    // Simulate complex business processes
    @GetMapping("/test/timeout")
    public String testTimeOut(){
        return testService.testTimeOut();
    }
}

Last test, no more details

At the same time, in order to test high concurrency, use JMeter tool for simple testing. If readers have questions, you can check this article

Simulate 400000 level flow pressure (readers carefully configure it, the computer may crash...)

At this point, you can see that the load icon has appeared in the browser

We further expand and establish the service consumer sub module (main-service-8082)

It mainly depends on the configuration file pom.xml:

<!-- Service registration dependency -->
 <dependency>
     <groupId>org.springframework.cloud</groupId>
     <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
 </dependency>
 <!-- openfeign rely on  -->
 <dependency>
     <groupId>org.springframework.cloud</groupId>
     <artifactId>spring-cloud-starter-openfeign</artifactId>
 </dependency>
 <!-- Hystrix rely on -->
 <dependency>
     <groupId>org.springframework.cloud</groupId>
     <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
 </dependency>

Application.yml (note that the maximum waiting time of feign is not set at this time...):

server:
  port: 8082
eureka:
  client:
    register-with-eureka: true # Register yourself
    fetch-registry: true
    service-url:
      defaultZone: http://eureka7001.cn:7001/eureka,http://eureka7002.cn:7002/eureka
spring:
  application:
    name: main-service-8082

Main startup class:

package cn.wu;


import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.EnableEurekaClient;
import org.springframework.cloud.openfeign.EnableFeignClients;

@SpringBootApplication
@EnableEurekaClient // Open Eureka client
@EnableFeignClients // Open Feign
public class MainApplication8082 {
    public static void main(String[] args) {
        SpringApplication.run(MainApplication8082.class,args);
    }
}

Business layer interface:

package cn.wu.service.feign;

import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.GetMapping;

@FeignClient(value = "micro-service-8003")
public interface TestService {
    @GetMapping("/test/normal")
    String testNormal();
    @GetMapping("/test/timeout")
    String testTimeOut();
}

Control layer:

package cn.wu.controller;

import cn.wu.service.feign.TestService;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import javax.annotation.Resource;

@RestController
public class TestController {
    @Resource
    private TestService testService;

    @GetMapping("/test/normal")
    public String testNormal(){
        return testService.testNormal();
    }
    @GetMapping("/test/timeout")
    public String testTimeOut(){
        return testService.testTimeOut();
    }
}

Test the corresponding URL and the result is:


The above results are in our expectation. Because we do not configure the corresponding Feign waiting parameters, the default timeout will occur

At this time, continue to use JMeter for stress testing. Remember to modify the port in the URL (test the port of the service consumer...):

Concurrency or 400000 level traffic:

It's best not to try this concurrency. Here, the author's memory is 16GB of measured data, which is full in an instant. I feel that it may be down at any time

It can be found from the background console that some accessing threads have timed out, that is, these users have failed to access:

In the face of such high concurrency, there are two serious problems:

  • Timeout causes the server to slow down
  • Downtime or program error

Solution

Analyze the problem from two roles

  • The service provider times out and the service consumer cannot wait all the time. At this time, the service must be degraded
  • The service provider has been down, and the service consumer cannot wait all the time. At this time, the service must be degraded
  • When the service consumer fails, but the service provider fails (for example, the waiting time of the service consumer is less than the business processing time of the service provider), the service consumer will degrade its service

First, solve the problem of the service provider (micro-service-8003)

Modify business layer:

package cn.wu.service;

import com.netflix.hystrix.contrib.javanica.annotation.HystrixCommand;
import com.netflix.hystrix.contrib.javanica.annotation.HystrixProperty;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;



@Slf4j
@Service("testServiceBean")
public class TestService {

    public String testNormal(){
        log.info("Normal business ");
        return "The current thread name is:"+Thread.currentThread().getName()
                +",Simulate normal business processes " ;
    }

    // Set the peak processing time to 3 seconds, normal operation within the peak, and service degradation methods
    @HystrixCommand(fallbackMethod = "testTimeOutHandler",commandProperties = {
            @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "3000")
    })
    public String testTimeOut(){
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "Current thread Name:"+Thread.currentThread().getName()
                +",Simulate complex business processes ";
    }
    public String testTimeOutHandler(){
        return "I am a complex business alternative response to service degradation ";
    }
}

Modify the main startup class and add comments:

package cn.wu;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.circuitbreaker.EnableCircuitBreaker;
import org.springframework.cloud.netflix.eureka.EnableEurekaClient;

@SpringBootApplication
@EnableEurekaClient
@EnableCircuitBreaker
public class Application8003 {
    public static void main(String[] args) {
        SpringApplication.run(Application8003.class,args);
    }
}

Restart the 8003 port service and test:

Simple analysis: the processing time of the above business is 5 seconds, but the peak processing time of 3 seconds is set, so the service will be degraded

At the same time, it can also be used to degrade the service of some other exceptions:

// Set the peak processing time to 3 seconds, normal operation within the peak, and service degradation methods
@HystrixCommand(fallbackMethod = "testHandler",commandProperties = {
        @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "3000")
})
public String testTimeOut(){
    int temp = 11/0; // Abnormal occurrence
    try {
        Thread.sleep(5000);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    return "Current thread Name:"+Thread.currentThread().getName()
            +",Simulate complex business processes ";
}
public String testHandler(){
    return Thread.currentThread().getName()+": I am a complex business alternative response to service degradation ";
}

Restart, the test result is:

Then, solve the problem of the service consumer side (main-service-8082)

application.yml:

server:
  port: 8082
eureka:
  client:
    register-with-eureka: true # Register yourself
    fetch-registry: true
    service-url:
      defaultZone: http://eureka7001.cn:7001/eureka,http://eureka7002.cn:7002/eureka
spring:
  application:
    name: main-service-8082
feign:
  hystrix:
    enabled: true # Enable Feign's support for Hystrix

Main startup class:

package cn.wu;


import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.EnableEurekaClient;
import org.springframework.cloud.netflix.hystrix.EnableHystrix;
import org.springframework.cloud.openfeign.EnableFeignClients;

@SpringBootApplication
@EnableEurekaClient
@EnableFeignClients
@EnableHystrix // Turn on Hystrix
public class MainApplication8082 {
    public static void main(String[] args) {
        SpringApplication.run(MainApplication8082.class,args);
    }
}

Modify control layer:

package cn.wu.controller;

import cn.wu.service.feign.TestService;
import com.netflix.hystrix.contrib.javanica.annotation.HystrixCommand;
import com.netflix.hystrix.contrib.javanica.annotation.HystrixProperty;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import javax.annotation.Resource;

@RestController
public class TestController {
    @Resource
    private TestService testService;

    @GetMapping("/test/normal")
    public String testNormal(){
        return testService.testNormal();
    }

    @HystrixCommand(fallbackMethod = "testHandler" , commandProperties = {
            @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "3000")
    })
    @GetMapping("/test/timeout")
    public String testTimeOut(){
        return testService.testTimeOut();
    }

    public String testHandler(){
        return "Request busy, please try again later ";
    }
}

result:

If the maximum processing time peak of the service consumer is modified to 6s, the results are as follows:

There are several problems above

  • Business code and degradation method are written in the same class, and the code will appear miscellaneous
  • For each control layer method, a degradation method is required to correspond to it. If multiple control layer methods sometimes correspond to the same degradation method, the code will be redundant

Modify the service consumer control layer:

@RestController
@DefaultProperties(defaultFallback = "testHandler")
public class TestController {
    @Resource
    private TestService testService;

    @GetMapping("/test/normal")
    public String testNormal(){
        return testService.testNormal();
    }

//    @HystrixCommand(fallbackMethod = "testHandler" , commandProperties = {
//            @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "10000")
//    })
    @HystrixCommand
    @GetMapping("/test/timeout")
    public String testTimeOut(){
        return testService.testTimeOut();
    }

    public String testHandler(){
        return "Request busy, please try again later ";
    }
}

The above is the degradation method in the global configuration by default. For special service degradation, only the commented code above needs to be used

Further optimization

Implement the interface of @ ClientFeign annotation:

package cn.wu.service.feign;

import org.springframework.stereotype.Component;

@Component
public class TestServiceFallback implements TestService {
    @Override
    public String testNormal() {
        return "The service is busy, please try again later!";
    }

    @Override
    public String testTimeOut() {
        return "The service is busy, please try again later!";
    }
}

Modify interface:

package cn.wu.service.feign;

import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.stereotype.Component;
import org.springframework.web.bind.annotation.GetMapping;

@Component
@FeignClient(value = "micro-service-8003",fallback = TestServiceFallback.class)
public interface TestService {
    @GetMapping("/test/normal")
    String testNormal();
    @GetMapping("/test/timeout")
    String testTimeOut();
}

The above is to implement a service degradation method for each interface

Verify the service provider downtime results:

Tags: Java Back-end architecture Microservices

Posted on Mon, 08 Nov 2021 12:48:00 -0500 by linfidel