kubernetes entry to combat Pod health check mechanism

1. Health examination

1.1 overview of health examination

In the process of application running, errors are inevitable, such as program exceptions, software exceptions, hardware failures, network failures, etc. kubernetes provides Health Check mechanism, which automatically restarts the container when application exceptions are found, and removes the application from the service service to ensure the high availability of the application. k8s defines three Probe probes:

  • readiness probes are ready to check whether the traffic is ready to be accepted through readiness. If it is ready, it will be added to the endpoint, otherwise it will be eliminated
  • liveness probes online check mechanism to check whether the application is available, such as deadlock, unable to respond, and automatically restart the container in case of exception
  • Start up probes start checking mechanism, apply some slow start businesses, and avoid the business from being kill ed by the previous probes for a long time

Each detection mechanism supports three health check methods: exec on the command line, httpGet and tcpSocket. Exec is the most versatile and applicable to most scenarios. tcpSocket is applicable to TCP services and httpGet is applicable to web services.

  • exec provides command or shell detection, and performs command check in the container. The return code is 0 health, not 0 exception
  • Httpget http protocol probe, send http request in container, judge business health according to http return code
  • Tcpsocket tcp protocol detection, send tcp to the container to establish a connection, if it can be established, it means it is normal

Each detection method can support several same inspection parameters for setting control inspection time:

  • Initial delayseconds the initial first detection interval is used for the start time of the application to prevent the health check from failing before the application starts
  • periodSeconds check interval, how long to perform probe check, default is 10s
  • Timeout seconds the timeout time is too long. It is a failure after timeout is applied
  • Successthresholdsuccessful detection threshold, which indicates how many times the detection is healthy and normal, and the default detection is 1 time

1.2 exec command line health check

Many applications cannot detect internal faults during operation, such as deadlock, which can be recovered by restarting business in case of failure. kubernetes provides liveness online health check mechanism. We take exec as an example to create a file / TMP / liveness during container startup- probe.log , delete it after 10s, define the liveness health check mechanism to execute commands in the container Let ls -l /tmp/liveness-probe.log The health status is determined by the return code of the file. If the return code is not 0, kubelet will restart the container automatically after 20 seconds of pause.

1. Define a container, create a file at startup, and LS - L / TMP / liveness at health check- probe.log The return code is 0, and the health check is normal. Delete it after 10s. The return code is not 0, and the health check is abnormal

[root@node-1 demo]# cat centos-exec-liveness-probe.yaml
apiVersion: v1
kind: Pod
metadata:
  name: exec-liveness-probe
  annotations:
    kubernetes.io/description: "exec-liveness-probe"
spec:
  containers:
  - name: exec-liveness-probe
    image: centos:latest
    imagePullPolicy: IfNotPresent
    args:    #Container start command, life cycle is 30s
    - /bin/sh
    - -c
    - touch /tmp/liveness-probe.log && sleep 10 && rm -f /tmp/liveness-probe.log && sleep 20
    livenessProbe:
      exec:  #Health examination mechanism, through ls-l / TMP / liveness- probe.log Return code to judge the health status of the container
        command:
        - ls 
        - l 
        - /tmp/liveness-probe.log
      initialDelaySeconds: 1
      periodSeconds: 5
      timeoutSeconds: 1

2. Application configuration build container

[root@node-1 demo]# kubectl apply -f centos-exec-liveness-probe.yaml 
pod/exec-liveness-probe created

3. Check the event log of the container. After the container is started, the container is in normal state within 10s, and the liveness health check is started in 11s to check for exceptions and trigger the restart of the container

[root@node-1 demo]# kubectl describe pods exec-liveness-probe | tail
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  28s                default-scheduler  Successfully assigned default/exec-liveness-probe to node-3
  Normal   Pulled     27s                kubelet, node-3    Container image "centos:latest" already present on machine
  Normal   Created    27s                kubelet, node-3    Created container exec-liveness-probe
  Normal   Started    27s                kubelet, node-3    Started container exec-liveness-probe
  #Container started
  Warning  Unhealthy  20s (x2 over 25s)  kubelet, node-3    Liveness probe failed: /tmp/liveness-probe.log
ls: cannot access l: No such file or directory #Perform health check, check for abnormality
  Warning  Unhealthy  15s  kubelet, node-3  Liveness probe failed: ls: cannot access l: No such file or directory
ls: cannot access /tmp/liveness-probe.log: No such file or directory
  Normal  Killing  15s  kubelet, node-3  Container exec-liveness-probe failed liveness probe, will be restarted
  #Restart container

4. Check the number of container RESTARTS. The number of RESTARTS will increase in response to the continuous execution of the container. You can see that the number of RESTARTS continues to increase

[root@node-1 demo]# kubectl get pods exec-liveness-probe 
NAME                  READY   STATUS    RESTARTS   AGE
exec-liveness-probe   1/1     Running   6          5m19s

1.3 httpGet health check

1. httpGet probe is mainly used in web scenarios. By sending HTTP requests to the container, the health status of the container can be judged according to the return code. If the return code is less than 4xx, it means health. Define an nginx application as follows. By detecting http: / < container >: Port/ index.html How to judge health status

[root@node-1 demo]# cat nginx-httpGet-liveness-readiness.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-httpget-livess-readiness-probe 
  annotations:
    kubernetes.io/description: "nginx-httpGet-livess-readiness-probe"
spec:
  containers:
  - name: nginx-httpget-livess-readiness-probe
    image: nginx:latest
    ports:
    - name: http-80-port
      protocol: TCP
      containerPort: 80
    livenessProbe:   #Health check mechanism, realized by httpGet
      httpGet:
        port: 80
        scheme: HTTP
        path: /index.html
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3

2. Generate pod and view health status

[root@node-1 demo]# kubectl apply -f nginx-httpGet-liveness-readiness.yaml 
pod/nginx-httpget-livess-readiness-probe created
[root@node-1 demo]# kubectl get pods nginx-httpget-livess-readiness-probe 
NAME                                   READY   STATUS    RESTARTS   AGE
nginx-httpget-livess-readiness-probe   1/1     Running   0          6s

3. Simulate a failure, delete the path file in the pod. At this time, the health check will be abnormal when sending the http request, which will trigger the automatic restart of the container

query pod Node to which
[root@node-1 demo]# kubectl get pods nginx-httpget-livess-readiness-probe -o wide 
NAME                                   READY   STATUS    RESTARTS   AGE    IP            NODE     NOMINATED NODE   READINESS GATES
nginx-httpget-livess-readiness-probe   1/1     Running   1          3m9s   10.244.2.19   node-3   <none>           <none>

//Log in to pod to delete files
[root@node-1 demo]# kubectl exec -it nginx-httpget-livess-readiness-probe /bin/bash
root@nginx-httpget-livess-readiness-probe:/# ls -l /usr/share/nginx/html/index.html 
-rw-r--r-- 1 root root 612 Sep 24 14:49 /usr/share/nginx/html/index.html
root@nginx-httpget-livess-readiness-probe:/# rm -f /usr/share/nginx/html/index.html 

4. Check the list of pod s again. At this time, the number of RESTART will be increased by 1, indicating that the RESTART has been done once. How long ago did the AGE RESTART

[root@node-1 demo]# kubectl get pods nginx-httpget-livess-readiness-probe 
NAME                                   READY   STATUS    RESTARTS   AGE
nginx-httpget-livess-readiness-probe   1/1     Running   1          4m22s

5. Check the details of pod, observe the restart of the container, check the 404 error of the container through Liveness, and trigger the restart.

[root@node-1 demo]# kubectl describe pods nginx-httpget-livess-readiness-probe | tail
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m45s                  default-scheduler  Successfully assigned default/nginx-httpget-livess-readiness-probe to node-3
  Normal   Pulling    3m29s (x2 over 5m45s)  kubelet, node-3    Pulling image "nginx:latest"
  Warning  Unhealthy  3m29s (x3 over 3m49s)  kubelet, node-3    Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    3m29s                  kubelet, node-3    Container nginx-httpget-livess-readiness-probe failed liveness probe, will be restarted
  Normal   Pulled     3m25s (x2 over 5m41s)  kubelet, node-3    Successfully pulled image "nginx:latest"
  Normal   Created    3m25s (x2 over 5m40s)  kubelet, node-3    Created container nginx-httpget-livess-readiness-probe
  Normal   Started    3m25s (x2 over 5m40s)  kubelet, node-3    Started container nginx-httpget-livess-readiness-probe

1.4 tcpSocket health check

1. TCP socket health check is applicable to TCP services. By establishing a TCP connection to a specified container, you can establish a connection and the health check is normal. Otherwise, the health check is abnormal. Take nignx as an example to use the TCP health check mechanism to detect the connectivity of port 80

[root@node-1 demo]# cat nginx-tcp-liveness.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-tcp-liveness-probe
  annotations:
    kubernetes.io/description: "nginx-tcp-liveness-probe"
spec:
  containers:
  - name: nginx-tcp-liveness-probe 
    image: nginx:latest
    ports:
    - name: http-80-port
      protocol: TCP
      containerPort: 80
    livenessProbe:  #The health check is tcpSocket, which detects TCP port 80
      tcpSocket:
       port: 80
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3

2. Application configuration creation container

[root@node-1 demo]# kubectl apply -f nginx-tcp-liveness.yaml 
pod/nginx-tcp-liveness-probe created

[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe 
NAME                       READY   STATUS    RESTARTS   AGE
nginx-tcp-liveness-probe   1/1     Running   0          6s

3. Simulate the failure, obtain the node to which the pod belongs, log in to the pod, and install the process viewing tool, htop

obtain pod where node
[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe -o wide 
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
nginx-tcp-liveness-probe   1/1     Running   0          99s   10.244.2.20   node-3   <none>           <none>

//Log in to pod
[root@node-1 demo]# kubectl exec -it nginx-httpget-livess-readiness-probe /bin/bash

#Perform apt get update update and apt get install htop installation tools
root@nginx-httpget-livess-readiness-probe:/# apt-get update      
Get:1 http://cdn-fastly.deb.debian.org/debian buster InRelease [122 kB]             
Get:2 http://security-cdn.debian.org/debian-security buster/updates InRelease [39.1 kB]     
Get:3 http://cdn-fastly.deb.debian.org/debian buster-updates InRelease [49.3 kB]            
Get:4 http://security-cdn.debian.org/debian-security buster/updates/main amd64 Packages [95.7 kB]
Get:5 http://cdn-fastly.deb.debian.org/debian buster/main amd64 Packages [7899 kB]
Get:6 http://cdn-fastly.deb.debian.org/debian buster-updates/main amd64 Packages [5792 B]
Fetched 8210 kB in 3s (3094 kB/s)
Reading package lists... Done
root@nginx-httpget-livess-readiness-probe:/# apt-get install htop
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  lsof strace
The following NEW packages will be installed:
  htop
0 upgraded, 1 newly installed, 0 to remove and 5 not upgraded.
Need to get 92.8 kB of archives.
After this operation, 230 kB of additional disk space will be used.
Get:1 http://cdn-fastly.deb.debian.org/debian buster/main amd64 htop amd64 2.2.0-1+b1 [92.8 kB]
Fetched 92.8 kB in 0s (221 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package htop.
(Reading database ... 7203 files and directories currently installed.)
Preparing to unpack .../htop_2.2.0-1+b1_amd64.deb ...
Unpacking htop (2.2.0-1+b1) ...
Setting up htop (2.2.0-1+b1) ...

4. Run the htop to view the process. The container process is usually 1. kill the process and observe the container state. Observe the increase of RESTART times

root@nginx-httpget-livess-readiness-probe:/# kill 1
root@nginx-httpget-livess-readiness-probe:/# command terminated with exit code 137

//View pod
[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe 
NAME                       READY   STATUS    RESTARTS   AGE
nginx-tcp-liveness-probe   1/1     Running   1          13m

5. Check the details of the container and find that the container has a restart record

[root@node-1 demo]# kubectl describe pods nginx-tcp-liveness-probe | tail
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age                From               Message
  ----    ------     ----               ----               -------
  Normal  Scheduled  14m                default-scheduler  Successfully assigned default/nginx-tcp-liveness-probe to node-3
  Normal  Pulling    44s (x2 over 14m)  kubelet, node-3    Pulling image "nginx:latest"
  Normal  Pulled     40s (x2 over 14m)  kubelet, node-3    Successfully pulled image "nginx:latest"
  Normal  Created    40s (x2 over 14m)  kubelet, node-3    Created container nginx-tcp-liveness-probe
  Normal  Started    40s (x2 over 14m)  kubelet, node-3    Started container nginx-tcp-liveness-probe

1.5 readiness

Readiness check is used in the scenario where the application is connected to the service. It is used to determine whether the application is ready, i.e. whether it can accept external forwarding traffic. If the health check is normal, the pod will be added to the endpoints of the service. If the health check is abnormal, it will be deleted from the endpoints of the service to avoid affecting the access of the service.

1. Create a pod, use httpGet's health check mechanism, and define the readness readiness probe check path/ test.html

[root@node-1 demo]# cat httpget-liveness-readiness-probe.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-tcp-liveness-probe
  annotations:
    kubernetes.io/description: "nginx-tcp-liveness-probe"
  labels:  #labels need to be defined, and the service defined later needs to call
    app: nginx
spec:
  containers:
  - name: nginx-tcp-liveness-probe 
    image: nginx:latest
    ports:
    - name: http-80-port
      protocol: TCP
      containerPort: 80
    livenessProbe:  #Survival probe
      httpGet:
        port: 80
        path: /index.html
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3
    readinessProbe:  #Readiness check probe
      httpGet:
        port: 80
        path: /test.html
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3

2. Define a service and add the pod to the service. Note that the labels defined above are used, app=nginx

[root@node-1 demo]# cat nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx-service 
spec:
  ports:
  - name: http
    port: 80 
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx 
  type: ClusterIP

3. Build configuration

[root@node-1 demo]# kubectl apply -f httpget-liveness-readiness-probe.yaml 
pod/nginx-tcp-liveness-probe created
[root@node-1 demo]# kubectl apply -f nginx-service.yaml 
service/nginx-service created

4. At this time, the pod status is normal, and the readiness health check is abnormal

[root@node-1 ~]# kubectl get pods nginx-httpget-livess-readiness-probe 
NAME                                   READY   STATUS    RESTARTS   AGE
nginx-httpget-livess-readiness-probe   1/1     Running   2          153m

#readiness health check abnormal, 404 error (last line)
[root@node-1 demo]# kubectl describe pods nginx-tcp-liveness-probe | tail
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  2m6s                default-scheduler  Successfully assigned default/nginx-tcp-liveness-probe to node-3
  Normal   Pulling    2m5s                kubelet, node-3    Pulling image "nginx:latest"
  Normal   Pulled     2m1s                kubelet, node-3    Successfully pulled image "nginx:latest"
  Normal   Created    2m1s                kubelet, node-3    Created container nginx-tcp-liveness-probe
  Normal   Started    2m1s                kubelet, node-3    Started container nginx-tcp-liveness-probe
  Warning  Unhealthy  2s (x12 over 112s)  kubelet, node-3    Readiness probe failed: HTTP probe failed with statuscode: 404

5. Check the endpoints of the services and find that the endpoints are empty at this time. Because the readiness readiness check is abnormal, kubelet thinks that the pod is not ready at this time, so it is not added to the endpoints.

[root@node-1 ~]# kubectl describe services nginx-service 
Name:              nginx-service
Namespace:         default
Labels:            app=nginx
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector:          app=nginx
Type:              ClusterIP
IP:                10.110.54.40
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         <none> #Endpoints object is empty
Session Affinity:  None
Events:            <none>

#endpoints status
[root@node-1 demo]# kubectl describe endpoints nginx-service 
Name:         nginx-service
Namespace:    default
Labels:       app=nginx
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2019-09-30T14:27:37Z
Subsets:
  Addresses:          <none>
  NotReadyAddresses:  10.244.2.22  #pod is not ready
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

Events:  <none>

6. Enter the pod and create the website file manually to make the readiness health check normal

[root@node-1 ~]# kubectl exec -it nginx-httpget-livess-readiness-probe /bin/bash
root@nginx-httpget-livess-readiness-probe:/# echo "readiness probe demo" >/usr/share/nginx/html/test.html

7. At this time, the readiness health check is normal, and kubelet will add the pod to endpoints when it detects that it is ready

Normal health examination
[root@node-1 demo]# curl http://10.244.2.22/test.html

//View endpoints
readines[root@node-1 demo]# kubectl describe endpoints nginx-service 
Name:         nginx-service
Namespace:    default
Labels:       app=nginx
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2019-09-30T14:33:01Z
Subsets:
  Addresses:          10.244.2.22 #Ready Address, proposed from NotReady, added to the normal Address list
  NotReadyAddresses:  <none>
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

//View service status
[root@node-1 demo]# kubectl describe services nginx-service 
Name:              nginx-service
Namespace:         default
Labels:            app=nginx
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector:          app=nginx
Type:              ClusterIP
IP:                10.110.54.40
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         10.244.2.22:80 #Associated with endpoints
Session Affinity:  None
Events:            <none>

8. Similarly, if the health check of the container is abnormal at this time, kubelet will automatically move it to endpoint

Delete site information, make health check abnormal
[root@node-1 demo]# kubectl exec -it nginx-tcp-liveness-probe  /bin/bash
root@nginx-tcp-liveness-probe:/# rm -f /usr/share/nginx/html/test.html 

//View pod health check event log
[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe 
NAME                       READY   STATUS    RESTARTS   AGE
nginx-tcp-liveness-probe   0/1     Running   0          11m
[root@node-1 demo]# kubectl describe pods nginx-tcp-liveness-probe | tail
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  12m                  default-scheduler  Successfully assigned default/nginx-tcp-liveness-probe to node-3
  Normal   Pulling    12m                  kubelet, node-3    Pulling image "nginx:latest"
  Normal   Pulled     11m                  kubelet, node-3    Successfully pulled image "nginx:latest"
  Normal   Created    11m                  kubelet, node-3    Created container nginx-tcp-liveness-probe
  Normal   Started    11m                  kubelet, node-3    Started container nginx-tcp-liveness-probe
  Warning  Unhealthy  119s (x32 over 11m)  kubelet, node-3    Readiness probe failed: HTTP probe failed with statuscode: 404

//View endpoints
[root@node-1 demo]# kubectl describe endpoints nginx-service 
Name:         nginx-service
Namespace:    default
Labels:       app=nginx
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2019-09-30T14:38:01Z
Subsets:
  Addresses:          <none>
  NotReadyAddresses:  10.244.2.22 #The health check is abnormal. At this time, it is added to the NotReady status
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

Events:  <none>

//View the service status, and the endpoints are empty
[root@node-1 demo]# kubectl describe services nginx-service 
Name:              nginx-service
Namespace:         default
Labels:            app=nginx
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector:          app=nginx
Type:              ClusterIP
IP:                10.110.54.40
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         #Empty
Session Affinity:  None
Events:            <none>

Write at the end

This chapter introduces two types of health check probes in kubernetes: livenessProbe and readinessProbe. livenessProbe is mainly used for survival check and check the internal running state of the container. Readiness is mainly used for readiness check and whether the traffic can be accepted. It usually needs to be combined with the endpoints of the service. When the readiness is ready, it will be added to the endpoints. When the readiness is abnormal, it will be deleted from the endpoints In addition, the health check and service detection mechanism of services are realized. Three detection methods are provided for the Probe mechanism, which are respectively applicable to different scenarios: 1. exec command line, health check through command or shell, 2. tcpSocket detects port through TCP protocol, establishes TCP connection, 3. httpGet establishes http request detection, and readers can master its usage through multiple operations.

appendix

health examination: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

TKE health check setting method: https://cloud.tencent.com/document/product/457/32815

Tags: Linux Nginx kubelet Kubernetes CentOS

Posted on Fri, 12 Jun 2020 03:25:48 -0400 by nenena