k8s -- container start and exit actions + list watch mechanism + scheduling of affinity / anti affinity between node and pod

catalogue

1, Start and exit actions

vim demo1.yaml
==========================================================
apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: soscscs/myapp:v1
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", " echo '333333' >> /var/log/nginx/message"]
      preStop:
        exec:
          command: ["/bin/sh", "-c", " echo '222222'  >> /var/log/nginx/message"]
    volumeMounts:
    - name: message-log
      mountPath: /var/log/nginx/
      readOnly: false
  initContainers:
  - name: init-myservice
    image: soscscs/myapp:v1
    command: ["/bin/sh", "-c", "echo '111111'   >> /var/log/nginx/message"]
    volumeMounts:
    - name: message-log
      mountPath: /var/log/nginx/
      readOnly: false
  volumes:
  - name: message-log
    hostPath:
      path: /data/volumes/nginx/log/
      type: DirectoryOrCreate
==========================================================
kubectl apply -f demo1.yaml

kubectl get pods -o wide

stay node01 View on upper node
cat /data/volumes/nginx/log/message




2, List watch mechanism of k8s


1. The user submits a request to APIServer through kubectl or other API clients to create a Pod object copy
2.APIServer attempts to store the relevant meta information of the Pod object into etcd. After the write operation is completed, APIServer will return the confirmation information to the client
3. After etcd accepts the Create Pod information, it will send a Create event to APIServer
4. Because the Controller Manager has been listening to (Watch, through port 8080 of http) events in APIServer. At this time, APIServer receives the Create event and sends it to the Controller Manager
After receiving the Create event, 5.Controller Manager invokes the Replication Controller to ensure the number of replicas that need to be created on Node.
6. After the Controller Manager creates a Pod copy, APIServer will record the details of the Pod in etcd. For example, the number of copies of Pod and the information of Container
7. Similarly, etcd will send the information of creating Pod to APIServer through event
8. Because the Scheduler is monitoring the APIServer and plays a "connecting role" in the system, "connecting" means that it is responsible for receiving the created Pod events and arranging nodes for them; "Start up" means that after the placement work is completed, the kubelet process on the Node will take over the follow-up work and be responsible for the "second half of the Pod life cycle". In other words, the Scheduler is used to bind the Pod to be scheduled to the nodes in the cluster according to the scheduling algorithm and policy
9. After scheduling, the scheduler will update the information of Pod. At this time, the information is richer. In addition to knowing the number of copies of the Pod, the content of the copy. You also know which Node to deploy to. Update the above Pod information to API Server, and update it from APIServer to etcd and save it
10.etcd sends the event of successful update to APIServer, and APIServer also starts to reflect the scheduling result of this Pod object
11.kubelet is a process running on the Node. It also listens for Pod update events sent by APIServer through list Watch (through port 6443 of https). kubelet will try to call Docker on the current Node to start the container, and send the Pod and the result status of the container back to APIServer
12.APIServer stores Pod status information in etcd. After etcd confirms that the write operation is completed successfully, APIServer sends the confirmation information to the relevant kubelet, and the event will be accepted through it
Note: kubectl sends a command to expand the number of Pod copies, the above process will be triggered again, and kubelet will adjust the resources of the Node according to the latest deployment of Pod. Or the number of Pod copies does not change, but the image file is upgraded, and kubelet will automatically obtain the latest image file and load it

3, Scheduling process

3.1 dispatching strategy

1.Sheduler runs as a separate program. After startup, it will always listen to APIServer and obtain a pod with an empty spec.nodeName. A binding will be created for each pod to indicate which node the pod should be placed on
2. Scheduling is divided into several parts: first, filter out nodes that do not meet the conditions. This process is called budget policy; Then, the passing nodes are sorted according to priority, which is the priorities; Finally, select the node with the highest priority. If there is an error in any of the intermediate steps, the error is returned directly

3.2 common algorithms of budget strategy

1.PodFitsResources: Whether the remaining resources on the node are greater than pod Requested resources
2.PodFitsHost: If pod Specified NodeName,Check whether the node name and NodeName matching
3.PodFitsHostPorts: Already used on node port Whether and pod Applied port conflict
4.PodSelectorMatches: Filter out and pod designated label Mismatched nodes
5.NoDiskConflict: already mount of volume and pod designated volume No conflicts unless they are read-only

3.3 priority establishment

1. If there is no suitable node in the predict process, the pod will remain in the pending state and continue to retry scheduling until a node meets the conditions. After this step, if multiple nodes meet the conditions, continue the priorities process: sort the nodes according to the priority size
2. Priority consists of a series of key value pairs. The key is the name of the priority item and the value is its weight (the importance of the item). There are a series of common priority options, including:
1) Leastrequested priority: the weight is determined by calculating the utilization of CPU and Memory. The lower the utilization, the higher the weight. In other words, this priority indicator tends to nodes with lower resource utilization ratio
2) Balanced resource allocation: the closer the CPU and Memory utilization on the node, the higher the weight. This is generally used together with the above, not alone. For example, the CPU and Memory utilization of node01 is 20:60, and the CPU and Memory utilization of node02 is 50:50. Although the total utilization of node01 is lower than that of node02, the CPU and Memory utilization of node02 are closer, so node02 will be preferred during scheduling
3) ImageLocalityPriority: it tends to have nodes to use the image. The larger the total image size, the higher the weight
3. Calculate all priority items and weights through the algorithm to get the final result

4, Specify scheduling node

4.1 specify nodeName

#pod.spec.nodeName directly schedules the pod to the specified Node, and the Scheduler's scheduling policy will be skipped. The matching rule is forced matching
vim demo2.yaml
==========================================================
apiVersion: extensions/v1beta1  
kind: Deployment  
metadata:
  name: myapp
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myapp
    spec:
      nodeName: node01
      containers:
      - name: myapp
        image: nginx
        ports:
        - containerPort: 80
==========================================================
kubectl apply -f demo2.yaml

kubectl get pods -owide

kubectl describe pod myapp-86c89df7fc-6glj6


4.2 specifying nodeSelector

pod.spec.nodeSelector: adopt kubernetes of label-selector The mechanism selects nodes, which are matched by the scheduler's scheduling strategy label,Then scheduling Pod To the target node, the matching rule is a mandatory constraint
//Set the corresponding node labels as gxd=111 and gxd=222 respectively
kubectl label nodes node01 gxd=111
kubectl label nodes node02 gxd=222
//View label
kubectl get nodes --show-labels
//Change to nodeSelector scheduling mode
vim demo3.yaml
==========================================================
apiVersion: extensions/v1beta1  
kind: Deployment  
metadata:
  name: myapp1
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myapp1
    spec:
      nodeSelector:
	    gxd: "111"
      containers:
      - name: myapp1
        image: nginx
        ports:
        - containerPort: 80
==========================================================
kubectl apply -f demo3.yaml 

kubectl get pods -o wide

#View the detailed events (through the events, it can be found that it needs to be dispatched by the scheduler first)
kubectl describe pod myapp1-55c6cc597c-8xw9x


//To modify the value of a label, you need to add the -- overwrite parameter
kubectl label nodes node02 gxd=a --overwrite
kubectl get nodes --show-labels

//To delete a label, just specify the key name of the label at the end of the command line and connect it with a minus sign:
kubectl label nodes node02 gxd-

Specify label query node node
kubectl get node -l gxd=111

5, Affinity

5.1 classification

Official documents:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/assign-pod-node/

1.Node affinity 
pod.spec.nodeAffinity
●preferredDuringSchedulingIgnoredDuringExecution: Soft strategy
●requiredDuringSchedulingIgnoredDuringExecution: Hard strategy
2.Pod Affinity
pod.spec.affinity.podAffinity/podAntiAffinity
●preferredDuringSchedulingIgnoredDuringExecution: Soft strategy
●requiredDuringSchedulingIgnoredDuringExecution: Hard strategy

5.2 key value operation relationship

1.In: label The value of is in a list
2.NotIn: label The value of is not in a list
3.Gt: label The value of is greater than a value
4.Lt: label The value of is less than a value
5.Exists: Some label existence
6.DoesNotExist: Some label non-existent

5.3 node affinity + hard policy instance

vim demo4.yaml
==========================================================
apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: gxd    #Specifies the label of the node
            operator: NotIn     #Set that the tag value of Pod installed to kubernetes.io/hostname is not on the node in the values list
            values:
            - "111"
==========================================================
kubectl apply -f demo4.yaml

kubectl get pods -o wide

If the hard policy does not meet the conditions, Pod Status will always be Pending state
 Change the tag value to gxd=222,All nodes are not satisfied, create pod View status


5.4 node affinity + soft policy instance

vim demo5.yaml
==========================================================
apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: nginx
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1   #If there are multiple soft policy options, the greater the weight, the higher the priority
        preference:
          matchExpressions:
          - key: gxd
            operator: In
            values:
            - "111"
==========================================================
kubectl apply -f demo5.yaml

kubectl get pods -o wide

5.5 node affinity + soft policy + hard policy instance

vim demo6.yaml
==========================================================
apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:   #Meet the hard policy first and exclude the nodes with kubernetes.io/hostname=node02 label
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - node02
      preferredDuringSchedulingIgnoredDuringExecution:  #Then satisfy the soft strategy and give priority to the nodes with gxd=111 label
	  - weight: 1
        preference:
          matchExpressions:
          - key: gxd
            operator: In
            values:
            - "111"
==========================================================
kubectl apply -f demo6.yaml

kubectl get pods -o wide

6, Affinity and anti affinity of pod

6.1 create a Pod labeled app=myapp01

vim demo7.yaml
==========================================================
apiVersion: v1
kind: Pod
metadata:
  name: myapp01
  labels:
    app: myapp01
spec:
  containers:
  - name: with-node-affinity
    image: nginx
=========================================================
kubectl apply -f pod3.yaml

kubectl get pods --show-labels -o wide

6.2 using Pod affinity scheduling

vim demo8.yaml
==========================================================
apiVersion: v1
kind: Pod
metadata:
  name: myapp02
  labels:
    app: myapp02
spec:
  containers:
  - name: myapp02
    image: nginx
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - myapp01
        topologyKey: kubernetes.io/hostname
==========================================================
kubectl apply -f demo8.yaml

kubectl get pods --show-labels -o wide
==========================================================
#Only when the node is in the same topology domain with at least one running pod with a tag with a key of "app" and a value of "myapp01", the pod can be scheduled to the node. (more specifically, if node N has a label with the key kubernetes.io/hostname and a value of V, pod is eligible to run on node N so that at least one node in the cluster with the key kubernetes.io/hostname and a node with a value of V is running a pod with a label with the key "app" and the value "myapp01".)
#topologyKey is the key of the node label. If two nodes are marked with this key and have the same label value, the scheduler treats the two nodes as being in the same topology domain. The scheduler attempts to place a balanced number of pods in each topology domain.
#If   kubernetes.io/hostname   Different values correspond to different topological domains. For example, Pod1 is   kubernetes.io/hostname=node01   On the Node of, Pod2 is   kubernetes.io/hostname=node02   On the Node of, Pod3 is   kubernetes.io/hostname=node01   On the Node of, Pod2, Pod1 and Pod3 are not in the same topology domain, but Pod1 and Pod3 are in the same topology domain

6.2 Pod anti affinity scheduling

vim demo9.yaml
==========================================================
apiVersion: v1
kind: Pod
metadata:
  name: myapp03
  labels:
    app: myapp03
spec:
  containers:
  - name: myapp03
    image: nginx
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - myapp01
          topologyKey: kubernetes.io/hostname
==========================================================
kubectl apply -f demo9.yaml

kubectl get pods --show-labels -o wide
==========================================================
#If the node is in the same topology domain as the Pod and has the label of key "app" and value "myapp01", the Pod should not schedule it to the node. (if the topologyKey is kubernetes.io/hostname, it means that when the node and the Pod with the key "app" and the value "myapp01" are in the same area, the Pod cannot be scheduled to the node.)

7, Summary

7.1 affinity

1.node Node affinity: scheduling to meet Node Label condition of node Node node   nodeAffinity
 Hard strategy: conditions must be met   requiredDuringSchedulingIgnoredDuringExecution
 Soft strategy: try to meet the conditions. It doesn't matter if you can't meet them   preferredDuringSchedulingIgnoredDuringExecution
2.pod Affinity: scheduling to meet pod Corresponding to the label condition of node node   podAffinity
3.pod Anti affinity: not scheduled to meet pod Corresponding to the label condition of node node
scheduling strategy Match label Operator Topology domain support Scheduling target
nodeAffinity host In, NotIn, Exists,DoesNotExist, Gt, Lt no Specify host
podAffinity Pod In, NotIn, Exists,DoesNotExist yes Pod is the same topology domain as the specified pod
podAntiAffinity Pod In, NotIn, Exists,DoesNotExist yes Pod is not in the same topology domain as the specified pod

7.2 node hard policy configuration

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: KEY_NAME
            operator: In/NotIn/Exists/DoesNotExist/Gt/Lt
            values:
            - KEY_VALUE

7.3 node soft policy configuration

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: WEIGHT_VALUE
        preference:
          matchExpressions:
          - key: KEY_NAME
            operator: In/NotIn/Exists/DoesNotExist
            values:
            - KEY_VALUE

7.4 pod node (affinity / anti affinity) hard policy configuration

spec:
  affinity:
    podAffinity/podAnitAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: KEY_NAME
            operator: In/NotIn/Exists/DoesNotExist/Gt/Lt
            values:
            - KEY_VALUE
         topologyKey: kubernetes.io/hostname 

7.5 pod node (affinity / anti affinity) soft policy configuration

spec:
  affinity:
    podAffinity/podAnitAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: WEIGHT_VALUE
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: KEY_NAME
              operator: In/NotIn/Exists/DoesNotExist
              values:
              - KEY_VALUE
           topologyKey: kubernetes.io/hostname

Tags: Kubernetes

Posted on Mon, 08 Nov 2021 18:59:59 -0500 by HokieTracks