[Android]APP multi domain name service high availability scheme

Responsible for the company's basic data scanning and collection
The availability of this part to the system is basically 100%
Therefore, many highly available schemes have been made

Pre preparation

Before implementing the high availability of APP, we need to prepare:

1. Multiple downgrades of core domain names [one active and multiple standby]

The most basic requirement must be!!!
Support cloud distribution & local dynamic switching (blue-green distribution & gray scale & UAT)

2. Multiple CDN s

Each domain name uses a different CDN to avoid service unavailability due to CDN node failure (there have been production failures due to abnormal CDN nodes)

3. Multiple deployment network machine rooms

Each domain name is deployed in the network machine room in different regions (there has been a production failure in which the trunk line has been cut off due to construction)

===========================================================================
The core idea is: keep alive in the same city, remote disaster recovery, combined with service detection, abnormal dynamic switching

The above are the summary of blood and tears experience of actual production failure
Then, let's briefly talk about several high availability implementation schemes of APP

explain

The call priority of the domain name is based on the configuration issued by the cloud
The multi domain name configuration issued by the cloud will be cached in memory
Every network request follows
Multi domain name [0] -- > multi domain name [1] -- > multi domain name [2]

Dynamic multi domain name distribution

This is necessary, there is nothing to say, slightly

CDN node abnormality & CDN optimization

Start a service and ping the service regularly. In case of exceptions, directly switch to other alternate domain names until the main service is restored
Paste the following tools:

public static PingResult customCMD(String host,String command){
        PingResult result = new PingResult();
        try {
            result.host = host;
            Process process = Runtime.getRuntime().exec(command+host);
            process.waitFor();
            result.success = true;
            InputStreamReader reader = new InputStreamReader(process.getInputStream());
            BufferedReader br = new BufferedReader(reader);
            List<String> echo = new ArrayList<>();
            String line;
            while ((line = br.readLine()) != null) {
                echo.add(line);
            }
            PingResultParser.parsePingContent(result, echo);
        } catch (Exception e) {
            e.printStackTrace();
            result.success = false;
            result.ping_time = -1.0;
        }
        return result;
    }

    public static class PingResult {
        public String host;
        public boolean success;
        public double ping_time;

    }

    private static class PingResultParser {

        static void parse(PingResult result, List<String> echo) {
            //          PING www.a.shifen.com (xxx.xxx.xxx.xx) 56(84) bytes of data.
            //          64 bytes from xxx.xxx.xxx.xx: icmp_seq=1 ttl=52 time=21.5 ms
            //
            //           --- www.a.shifen.com ping statistics ---
            //          1 packets transmitted, 1 received, 0% packet loss, time 0ms
            //          rtt min/avg/max/mdev = 21.585/21.585/21.585/0.000 ms
            if (echo.size() == 6) {
                result.ping_time = getPingTime(echo.get(1));
            }
        }

        static double getPingTime(String line) {
            String[] block = line.split(" ");
            String timeStr = block[6].split("=")[1];
            return ValueUtils.parseDouble(timeStr, 0);
        }

        static void parsePingContent(PingResult result, List<String> echo) {
            for (String line : echo) {
                //Contents of ping
                if (line.contains("icmp_seq=") && line.contains("ttl=") && line.contains("time=")) {
                    String[] block = line.split(" ");
                    String timeStr = block[6].split("=")[1];
                    Log.d(TAG, "IP:" + result.host + ",Time consuming:" + timeStr + " ms");
                }

                if (line.contains("rtt min/avg/max/mdev")) {
                    String timeResult = line.split("=")[1];
                    String[] split = timeResult.split("/");
                    //Take average time
                    result.ping_time = Double.parseDouble(split[1]);
                    Log.d(TAG,"Domain name switching detection==="+result.host+",ping by:"+result.ping_time);
                }
            }
        }
    }

Start the service and run one side of each domain name regularly. You can use this command
**ping -c 3 -w 2 **
Then, based on the results, dynamically switch the BASEURL of retrofit

Request exception dynamic domain name switching

The user-defined interceptor, when encountering a request exception, dynamically switches to the next domain name and automatically re requests until all domain names retry
If you only need a simple retry, determine whether the host is a multi domain name and supports dynamic switching by obtaining the url of * * chain.request() * * and then dynamically switch and retry
If you don't understand, I suggest you take a look at the relevant source code of okhttp for the implementation of interceptors

On this basis, I added additional custom annotations
It is mainly used to implement additional custom retries for specific services

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface RetryCount {
    int extraCount();
}

However, it should be noted that retrofit version 2.5.0 and above is required

request = chain.request();
//Get custom annotation for retry request
Invocation tag = request.tag(Invocation.class);
Method method = tag != null ? tag.method() : null;
RetryCount retryCount = method != null ? method.getAnnotation(RetryCount.class) : null;

Minimize cloud

This plan is another direction
Maybe so

It is controlled by a dynamically active function switch
When the main service is found to be abnormal, actively switch the traffic to Alibaba cloud / Huawei and other cloud services
First, ensure that users can use it normally. First upload the data
Then through backlog monitoring + webhook robot alarm and other methods
To consume and analyze the main service

End

I haven't written a blog on Android for a long time

Tags: Android

Posted on Sat, 06 Nov 2021 17:46:27 -0400 by markjoe