SpringCloud Learning Four (Hystrix: Service Rupture, Service Degradation, Dashboard Flow Monitoring)

1. Hystrix: service is broken

1.1. Problems with Distributed Systems

Applications in a complex distributed architecture have dozens of dependencies, each of which will inevitably fail at some point!

1.2. Service avalanche

When invoking between multiple micro-services, suppose that micro-service A invokes micro-service B and micro-service C, and micro-service B and micro-service C invoke other micro-services, which is called "fan out". If the call response time of a micro-service on a fan-out link is too long or unavailable, the call to micro-service A will consume more and more system resources, which will cause the system to crash.The so-called avalanche effect.

For high-traffic applications, a single back-end dependency can cause all resources on all servers to be saturated in a few seconds.Worse than failures, these applications can also result in increased latencies between services, backup queues, thread and other system resource constraints, and more cascading failures across the system, all indicating the need to isolate and manage failures and latencies in order to achieve a single dependency failure without affecting the entire application or system.

We need the abandoned guard!

1.3. What is Hystrix?

Hystrix is an open source library for dealing with latency and fault tolerance in distributed systems. In distributed systems, many dependencies inevitably fail to invoke, such as timeouts, exceptions, etc. Hystrix can ensure that if a dependency fails, it will not cause the entire system service to fail and avoid cascading failures to improve the flexibility of distributed systems.

**"Circuit Breaker"** itself is a switching device that, when a service unit fails, returns a service-expected, handled alternative response (FallBack) to the caller through the failure monitoring of the breaker (similar to a fuse break) instead of a long wait or throwing an exception that the calling method cannot handle.This ensures that the service caller's threads will not be consumed for a long time and unnecessarily, thereby avoiding the spread of failures in distributed systems and even avalanches.

1.4. What can Hystrix do?

  • service degradation
  • Service Fusion
  • Service Limiting
  • Near real-time monitoring
  • ...

When everything works, the request flow can look like this:

When there is a potentially blocking service in many back-end systems, it can block entire user requests:

As the volume of traffic increases, the potential for a single back-end dependency causes all resources on all servers to become saturated in seconds.

Each point in the application that may cause network requests through the network or client libraries is the source of potential failures.Worse than failure, these applications can also cause delays between services to increase, backing up queues, threads, and other system resources, resulting in more cascading failures across systems.

When you wrap each base dependency using Hystrix, the architecture shown in the diagram above changes similar to the diagram below.Each dependency is isolated from each other, limited to the resources it can populate when a delay occurs, and included in fallback logic, which determines the response to any type of failure in the dependency:

Official website information:https://github.com/Netflix/Hystrix/wiki

2. Service Fusion

What is service meltdown?

The breakdown mechanism is a microservice link protection mechanism that bets on avalanche effect.

When a microservice in the fan-out link is unavailable or the response time is too long, the service is degraded,** which in turn breaks the call to the node's microservice and quickly returns the wrong response information.The call link is restored after the node's microservice call response is detected to be normal.In the SpringCloud framework, the fusing mechanism is implemented through Hystrix.Hystrix monitors the status of calls between microservices, and when a failed call fails to a certain threshold, by default, 20 calls fail within five seconds, the fuse mechanism starts. **The comment for the melting mechanism is: @HystrixCommand.

Service breakdown solves the following problems:

  • When the dependent object is unstable, it can fail quickly.
  • After a quick failure, it is possible to dynamically test whether the dependent object is recovered based on a certain algorithm.

2.1. Service Fusion Cases

Create a new springcloud-provider-dept-hystrix-8001 module and copy the pom.xml, resource, and Java code from springcloud-provider-dept-8001 to initialize and adjust.
Import hystrix dependencies

<!--Import Hystrix rely on-->

Adjust yml profile

  port: 8001
# mybatis configuration
  # pojo package under springcloud-api module
  type-aliases-package: com.haust.springcloud.pojo
  # The mybatis-config.xml core profile class path under this module
  config-location: classpath:mybatis/mybatis-config.xml
  # The mapper profile class path under this module
  mapper-locations: classpath:mybatis/mapper/*.xml
# spring Configuration
    #Project Name
    name: springcloud-provider-dept
    # Druid Data Source
    type: com.alibaba.druid.pool.DruidDataSource
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://localhost:3306/db01?useUnicode=true&characterEncoding=utf-8
    username: root
    password: root
# Eureka configuration: Configure service registry address
      # Registry Address 7001-7003
      defaultZone: http://eureka7001.com:7001/eureka/,http://eureka7002.com:7002/eureka/,http://eureka7003.com:7003/eureka/
    instance-id: springcloud-provider-dept-hystrix-8001 #Modify the default description information on Eureka
    prefer-ip-address: true #Change to true to show ip address by default instead of localhost
#info configuration
  app.name: haust-springcloud #Name of project
  company.name: com.haust #Name of the company

prefer-ip-address: false:

prefer-ip-address: true:

Modify controller

public class DeptController {
    private DeptService deptService;
     * Query department information based on id
     * If an exception occurs based on the id query, follow the hystrixGet alternative code
     * @param id
     * @return
    @HystrixCommand(fallbackMethod = "hystrixGet")
    @RequestMapping("/dept/get/{id}")//Query by id
    public Dept get(@PathVariable("id") Long id){
        Dept dept = deptService.queryById(id);
        if (dept==null){
            throw new RuntimeException("this id=>"+id+",The user does not exist or the information cannot be found~");
        return dept;
     * Query alternatives based on id (fused)
     * @param id
     * @return
    public Dept hystrixGet(@PathVariable("id") Long id){
        return new Dept().setDeptno(id)
                .setDname("this id=>"+id+",No corresponding information,null---@Hystrix~")
                .setDb_source("stay MySQL No such database in");

Add support comment @EnableCircuitBreaker for main startup class

@EnableEurekaClient // Startup class for the EnableEurekaClient client that automatically registers the service with the registry after the service starts
@EnableDiscoveryClient // Service Discovery~
@EnableCircuitBreaker // Add support notes for fuses
public class HystrixDeptProvider_8001 {
    public static void main(String[] args) {

After using the fuse, when accessing an id that does not exist, the front page shows the data as follows:

Instead of a fused springcloud-provider-dept-8001 module accessing the same address, the following occurs:

Therefore, it is necessary to use a fuse to avoid errors in the entire application or web page caused by an exception or error in a microservice background.

3. Service Demotion

3.1. What is service downgrade?

Service downgrade means that when server pressure increases sharply, some services and pages are not handled strategically according to actual business conditions and traffic, or handled in a simple way, thereby releasing server resources to ensure the normal or efficient operation of core business.To put it plainly, try to give system resources to high priority services as much as possible.

Resources are limited and requests are unlimited.If you do not downgrade the service during the concurrent peak period, on the one hand, it will certainly affect the performance of the overall service. Severe cases may cause downtime of some important services unavailable.Therefore, in the rush hour, in order to ensure the availability of core function services, some services should be downgraded.For example, when Double 11 was active, downgraded all non-transactional service systems, such as viewing ants in the forest, viewing historical orders, and so on.

What are the scenarios in which service downgrades are mainly used?When the overall load of the entire micro-service architecture exceeds the preset upper threshold or when upcoming traffic is expected to exceed the preset threshold, some non-essential or non-urgent services or tasks can be delayed or suspended in order to ensure that important or basic services can function properly.

Depending on the business, the downgrade can delay services, such as adding credits to users, but putting them in a cache and waiting for the service to stabilize before executing.Or shut down services at a granular scale, such as turning off recommendations from related articles.

As can be seen from the figure above, when the amount of access to service A increases dramatically and B and C visit less, in order to alleviate the pressure on service A, it is necessary for B and C to temporarily shut down some of the service functions to undertake some of the services of A, thus sharing the pressure on A, called service downgrade.

Issues to consider for service downgrade

  1. Which services are core services and which are non-core services
  2. Which services can support downgrade, which services cannot support downgrade, what is the downgrade strategy
  3. Are there more complex business liberalization scenarios in addition to service downgrades, and what are the strategies?

Automatic downgrade classification

  1. Timeout demotion: Mainly configure timeout time and number and mechanism of timeout retries, and use asynchronous mechanism to detect replies
  2. Degradation of failures: mainly unstable api, when the number of failed calls reaches a certain threshold, automatically degrade, also use asynchronous mechanism to detect replies
  3. Failure demotion: For example, if the remote service to be invoked hangs up (network failure, DNS failure, http service return error status code, rpc service throws an exception), you can directly demote.The downgraded solutions are: default values (e.g., inventory service hangs up, default spot returns), background data (e.g., advertisements hang up, returning some static pages prepared in advance), caching (some cached data that was temporarily stored before)
  4. Limit downgrade: When killing seconds or rushing to buy some restricted goods, the system may crash because of too much access. Limit flow will be used to limit access at this time. When the limit threshold is reached, subsequent requests will be downgraded;The downgraded solutions can be queued pages (divert users to queued pages for a retry), out of stock (directly inform users that they are out of stock), error pages (if the activity is too hot, try again later).

3.2. Cases

Create a new downgraded configuration class DeptClientServiceFallBackFactory.java in the service package under the springcloud-api module

public class DeptClientServiceFallBackFactory implements FallbackFactory {
    public DeptClientService create(Throwable cause) {
        return new DeptClientService() {
            public Dept queryById(Long id) {
                return new Dept()
                        .setDname("id=>" + id + "Without the corresponding information, the client provided demotion information and the service has now been shut down")
                        .setDb_source("no data~");
            public List<Dept> queryAll() {
                return null;
            public Boolean addDept(Dept dept) {
                return false;

Specify the downgraded configuration class DeptClientServiceFallBackFactory in DeptClientService

@Component //Register with spring Container
//@FeignClient: Micro-service client comment, value: Specifies the name of the micro-service so that the FeignClient can find the corresponding micro-service directly
@FeignClient(value = "SPRINGCLOUD-PROVIDER-DEPT",fallbackFactory = DeptClientServiceFallBackFactory.class)//fallbackFactory specifies the downgraded configuration class
public interface DeptClientService {
    public Dept queryById(@PathVariable("id") Long id);
    public List<Dept> queryAll();
    public Boolean addDept(Dept dept);

Turn on demotion in the springcloud-consumer-dept-feign module:

  port: 80
# Eureka configuration
    register-with-eureka: false # Do not register yourself with Eureka
    service-url: # Access one of the three registries at random
      defaultZone: http://eureka7001.com:7001/eureka/,http://eureka7002.com:7002/eureka/,http://eureka7003.com:7003/eureka/
# Turn on demoted feign.hystrix
    enabled: true

4. Difference between service breakdown and downgrade

  • Service Fuse --> Server: A service timed out or abnormal, causing a fuse ~, similar to a fuse (self-fuse)
  • Service downgrade - > Client: Considering the load requested from the overall website, when a service is broken or shut down, the service will no longer be invoked. At this time, on the client side, we can prepare a FallBackFactory and return a default value (default value).This can cause overall service degradation, but it's better to use it for good than to hang it up directly.
  • The trigger causes are different. Service outage is usually caused by a service (downstream service) failure, while service downgrade is generally considered from the overall load;The hierarchy of management objectives is different. Fusion is actually a framework-level process, with each microservice requiring (no hierarchy) and downgrading typically requiring a hierarchy of businesses (for example, downgrading typically starts with the most peripheral service).
  • The implementation is different, and service downgrades are code-invasive (completed by the controller or automatically downgraded), with a fuse commonly referred to as self-fusing.

Fuse, Downgrade, Limit Current:

Limit flow: Limit concurrent request access and reject when exceeding the threshold;

Demotion: Services are prioritized at the expense of non-core services (unavailable) to ensure the stability of core services;Considering the overall load;

Fuse: Dependent downstream service failures trigger the fuse to avoid causing the system crash;System Auto Execution and Recovery

5. Dashboard Stream Monitoring

New springcloud-consumer-hystrix-dashboard module

Add Dependency

<!--Hystrix rely on-->
<!--dashboard rely on-->
<!--Entity Class+web-->
<!--Hot Deployment-->

Main Startup Class

// Open Dashboard
public class DeptConsumerDashboard_9001 {
    public static void main(String[] args) {

Add the following code to the main startup class under the springcloud-provider-dept-hystrix-8001 module to add monitoring

@EnableEurekaClient //Startup class for the EnableEurekaClient client that automatically registers the service with the registry after the service starts
public class DeptProvider_8001 {
    public static void main(String[] args) {
    //Add a Servlet
    public ServletRegistrationBean hystrixMetricsStreamServlet(){
        ServletRegistrationBean registrationBean = new ServletRegistrationBean(new HystrixMetricsStreamServlet());
        //Visiting this page is the monitoring page
        return registrationBean;


Enter the monitoring page:

The effect is as follows:

Tags: Python Java Spring Cloud

Posted on Sun, 05 Sep 2021 14:28:58 -0400 by wmac