1, Start and exit actions
vim demo1.yaml ========================================================== apiVersion: v1 kind: Pod metadata: name: lifecycle-demo spec: containers: - name: lifecycle-demo-container image: soscscs/myapp:v1 lifecycle: postStart: exec: command: ["/bin/sh", "-c", " echo '333333' >> /var/log/nginx/message"] preStop: exec: command: ["/bin/sh", "-c", " echo '222222' >> /var/log/nginx/message"] volumeMounts: - name: message-log mountPath: /var/log/nginx/ readOnly: false initContainers: - name: init-myservice image: soscscs/myapp:v1 command: ["/bin/sh", "-c", "echo '111111' >> /var/log/nginx/message"] volumeMounts: - name: message-log mountPath: /var/log/nginx/ readOnly: false volumes: - name: message-log hostPath: path: /data/volumes/nginx/log/ type: DirectoryOrCreate ========================================================== kubectl apply -f demo1.yaml kubectl get pods -o wide stay node01 View on upper node cat /data/volumes/nginx/log/message
2, List watch mechanism of k8s
1. The user submits a request to APIServer through kubectl or other API clients to create a Pod object copy
2.APIServer attempts to store the relevant meta information of the Pod object into etcd. After the write operation is completed, APIServer will return the confirmation information to the client
3. After etcd accepts the Create Pod information, it will send a Create event to APIServer
4. Because the Controller Manager has been listening to (Watch, through port 8080 of http) events in APIServer. At this time, APIServer receives the Create event and sends it to the Controller Manager
After receiving the Create event, 5.Controller Manager invokes the Replication Controller to ensure the number of replicas that need to be created on Node.
6. After the Controller Manager creates a Pod copy, APIServer will record the details of the Pod in etcd. For example, the number of copies of Pod and the information of Container
7. Similarly, etcd will send the information of creating Pod to APIServer through event
8. Because the Scheduler is monitoring the APIServer and plays a "connecting role" in the system, "connecting" means that it is responsible for receiving the created Pod events and arranging nodes for them; "Start up" means that after the placement work is completed, the kubelet process on the Node will take over the follow-up work and be responsible for the "second half of the Pod life cycle". In other words, the Scheduler is used to bind the Pod to be scheduled to the nodes in the cluster according to the scheduling algorithm and policy
9. After scheduling, the scheduler will update the information of Pod. At this time, the information is richer. In addition to knowing the number of copies of the Pod, the content of the copy. You also know which Node to deploy to. Update the above Pod information to API Server, and update it from APIServer to etcd and save it
10.etcd sends the event of successful update to APIServer, and APIServer also starts to reflect the scheduling result of this Pod object
11.kubelet is a process running on the Node. It also listens for Pod update events sent by APIServer through list Watch (through port 6443 of https). kubelet will try to call Docker on the current Node to start the container, and send the Pod and the result status of the container back to APIServer
12.APIServer stores Pod status information in etcd. After etcd confirms that the write operation is completed successfully, APIServer sends the confirmation information to the relevant kubelet, and the event will be accepted through it
Note: kubectl sends a command to expand the number of Pod copies, the above process will be triggered again, and kubelet will adjust the resources of the Node according to the latest deployment of Pod. Or the number of Pod copies does not change, but the image file is upgraded, and kubelet will automatically obtain the latest image file and load it
3, Scheduling process
3.1 dispatching strategy
1.Sheduler runs as a separate program. After startup, it will always listen to APIServer and obtain a pod with an empty spec.nodeName. A binding will be created for each pod to indicate which node the pod should be placed on
2. Scheduling is divided into several parts: first, filter out nodes that do not meet the conditions. This process is called budget policy; Then, the passing nodes are sorted according to priority, which is the priorities; Finally, select the node with the highest priority. If there is an error in any of the intermediate steps, the error is returned directly
3.2 common algorithms of budget strategy
1.PodFitsResources: Whether the remaining resources on the node are greater than pod Requested resources 2.PodFitsHost: If pod Specified NodeName,Check whether the node name and NodeName matching 3.PodFitsHostPorts: Already used on node port Whether and pod Applied port conflict 4.PodSelectorMatches: Filter out and pod designated label Mismatched nodes 5.NoDiskConflict: already mount of volume and pod designated volume No conflicts unless they are read-only
3.3 priority establishment
1. If there is no suitable node in the predict process, the pod will remain in the pending state and continue to retry scheduling until a node meets the conditions. After this step, if multiple nodes meet the conditions, continue the priorities process: sort the nodes according to the priority size
2. Priority consists of a series of key value pairs. The key is the name of the priority item and the value is its weight (the importance of the item). There are a series of common priority options, including:
1) Leastrequested priority: the weight is determined by calculating the utilization of CPU and Memory. The lower the utilization, the higher the weight. In other words, this priority indicator tends to nodes with lower resource utilization ratio
2) Balanced resource allocation: the closer the CPU and Memory utilization on the node, the higher the weight. This is generally used together with the above, not alone. For example, the CPU and Memory utilization of node01 is 20:60, and the CPU and Memory utilization of node02 is 50:50. Although the total utilization of node01 is lower than that of node02, the CPU and Memory utilization of node02 are closer, so node02 will be preferred during scheduling
3) ImageLocalityPriority: it tends to have nodes to use the image. The larger the total image size, the higher the weight
3. Calculate all priority items and weights through the algorithm to get the final result
4, Specify scheduling node
4.1 specify nodeName
#pod.spec.nodeName directly schedules the pod to the specified Node, and the Scheduler's scheduling policy will be skipped. The matching rule is forced matching vim demo2.yaml ========================================================== apiVersion: extensions/v1beta1 kind: Deployment metadata: name: myapp spec: replicas: 3 template: metadata: labels: app: myapp spec: nodeName: node01 containers: - name: myapp image: nginx ports: - containerPort: 80 ========================================================== kubectl apply -f demo2.yaml kubectl get pods -owide kubectl describe pod myapp-86c89df7fc-6glj6
4.2 specifying nodeSelector
pod.spec.nodeSelector: adopt kubernetes of label-selector The mechanism selects nodes, which are matched by the scheduler's scheduling strategy label,Then scheduling Pod To the target node, the matching rule is a mandatory constraint //Set the corresponding node labels as gxd=111 and gxd=222 respectively kubectl label nodes node01 gxd=111 kubectl label nodes node02 gxd=222 //View label kubectl get nodes --show-labels //Change to nodeSelector scheduling mode vim demo3.yaml ========================================================== apiVersion: extensions/v1beta1 kind: Deployment metadata: name: myapp1 spec: replicas: 3 template: metadata: labels: app: myapp1 spec: nodeSelector: gxd: "111" containers: - name: myapp1 image: nginx ports: - containerPort: 80 ========================================================== kubectl apply -f demo3.yaml kubectl get pods -o wide #View the detailed events (through the events, it can be found that it needs to be dispatched by the scheduler first) kubectl describe pod myapp1-55c6cc597c-8xw9x
//To modify the value of a label, you need to add the -- overwrite parameter kubectl label nodes node02 gxd=a --overwrite kubectl get nodes --show-labels
//To delete a label, just specify the key name of the label at the end of the command line and connect it with a minus sign: kubectl label nodes node02 gxd-
Specify label query node node kubectl get node -l gxd=111
5, Affinity
5.1 classification
Official documents:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/assign-pod-node/ 1.Node affinity pod.spec.nodeAffinity ●preferredDuringSchedulingIgnoredDuringExecution: Soft strategy ●requiredDuringSchedulingIgnoredDuringExecution: Hard strategy 2.Pod Affinity pod.spec.affinity.podAffinity/podAntiAffinity ●preferredDuringSchedulingIgnoredDuringExecution: Soft strategy ●requiredDuringSchedulingIgnoredDuringExecution: Hard strategy
5.2 key value operation relationship
1.In: label The value of is in a list 2.NotIn: label The value of is not in a list 3.Gt: label The value of is greater than a value 4.Lt: label The value of is less than a value 5.Exists: Some label existence 6.DoesNotExist: Some label non-existent
5.3 node affinity + hard policy instance
vim demo4.yaml ========================================================== apiVersion: v1 kind: Pod metadata: name: affinity labels: app: node-affinity-pod spec: containers: - name: with-node-affinity image: nginx affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: gxd #Specifies the label of the node operator: NotIn #Set that the tag value of Pod installed to kubernetes.io/hostname is not on the node in the values list values: - "111" ========================================================== kubectl apply -f demo4.yaml kubectl get pods -o wide If the hard policy does not meet the conditions, Pod Status will always be Pending state Change the tag value to gxd=222,All nodes are not satisfied, create pod View status
5.4 node affinity + soft policy instance
vim demo5.yaml ========================================================== apiVersion: v1 kind: Pod metadata: name: affinity labels: app: node-affinity-pod spec: containers: - name: with-node-affinity image: nginx affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 #If there are multiple soft policy options, the greater the weight, the higher the priority preference: matchExpressions: - key: gxd operator: In values: - "111" ========================================================== kubectl apply -f demo5.yaml kubectl get pods -o wide
5.5 node affinity + soft policy + hard policy instance
vim demo6.yaml ========================================================== apiVersion: v1 kind: Pod metadata: name: affinity labels: app: node-affinity-pod spec: containers: - name: with-node-affinity image: nginx affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: #Meet the hard policy first and exclude the nodes with kubernetes.io/hostname=node02 label nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: NotIn values: - node02 preferredDuringSchedulingIgnoredDuringExecution: #Then satisfy the soft strategy and give priority to the nodes with gxd=111 label - weight: 1 preference: matchExpressions: - key: gxd operator: In values: - "111" ========================================================== kubectl apply -f demo6.yaml kubectl get pods -o wide
6, Affinity and anti affinity of pod
6.1 create a Pod labeled app=myapp01
vim demo7.yaml ========================================================== apiVersion: v1 kind: Pod metadata: name: myapp01 labels: app: myapp01 spec: containers: - name: with-node-affinity image: nginx ========================================================= kubectl apply -f pod3.yaml kubectl get pods --show-labels -o wide
6.2 using Pod affinity scheduling
vim demo8.yaml ========================================================== apiVersion: v1 kind: Pod metadata: name: myapp02 labels: app: myapp02 spec: containers: - name: myapp02 image: nginx affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - myapp01 topologyKey: kubernetes.io/hostname ========================================================== kubectl apply -f demo8.yaml kubectl get pods --show-labels -o wide ========================================================== #Only when the node is in the same topology domain with at least one running pod with a tag with a key of "app" and a value of "myapp01", the pod can be scheduled to the node. (more specifically, if node N has a label with the key kubernetes.io/hostname and a value of V, pod is eligible to run on node N so that at least one node in the cluster with the key kubernetes.io/hostname and a node with a value of V is running a pod with a label with the key "app" and the value "myapp01".) #topologyKey is the key of the node label. If two nodes are marked with this key and have the same label value, the scheduler treats the two nodes as being in the same topology domain. The scheduler attempts to place a balanced number of pods in each topology domain. #If kubernetes.io/hostname Different values correspond to different topological domains. For example, Pod1 is kubernetes.io/hostname=node01 On the Node of, Pod2 is kubernetes.io/hostname=node02 On the Node of, Pod3 is kubernetes.io/hostname=node01 On the Node of, Pod2, Pod1 and Pod3 are not in the same topology domain, but Pod1 and Pod3 are in the same topology domain
6.2 Pod anti affinity scheduling
vim demo9.yaml ========================================================== apiVersion: v1 kind: Pod metadata: name: myapp03 labels: app: myapp03 spec: containers: - name: myapp03 image: nginx affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - myapp01 topologyKey: kubernetes.io/hostname ========================================================== kubectl apply -f demo9.yaml kubectl get pods --show-labels -o wide ========================================================== #If the node is in the same topology domain as the Pod and has the label of key "app" and value "myapp01", the Pod should not schedule it to the node. (if the topologyKey is kubernetes.io/hostname, it means that when the node and the Pod with the key "app" and the value "myapp01" are in the same area, the Pod cannot be scheduled to the node.)
7, Summary
7.1 affinity
1.node Node affinity: scheduling to meet Node Label condition of node Node node nodeAffinity Hard strategy: conditions must be met requiredDuringSchedulingIgnoredDuringExecution Soft strategy: try to meet the conditions. It doesn't matter if you can't meet them preferredDuringSchedulingIgnoredDuringExecution 2.pod Affinity: scheduling to meet pod Corresponding to the label condition of node node podAffinity 3.pod Anti affinity: not scheduled to meet pod Corresponding to the label condition of node node
scheduling strategy | Match label | Operator | Topology domain support | Scheduling target |
---|---|---|---|---|
nodeAffinity | host | In, NotIn, Exists,DoesNotExist, Gt, Lt | no | Specify host |
podAffinity | Pod | In, NotIn, Exists,DoesNotExist | yes | Pod is the same topology domain as the specified pod |
podAntiAffinity | Pod | In, NotIn, Exists,DoesNotExist | yes | Pod is not in the same topology domain as the specified pod |
7.2 node hard policy configuration
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: KEY_NAME operator: In/NotIn/Exists/DoesNotExist/Gt/Lt values: - KEY_VALUE
7.3 node soft policy configuration
spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: WEIGHT_VALUE preference: matchExpressions: - key: KEY_NAME operator: In/NotIn/Exists/DoesNotExist values: - KEY_VALUE
7.4 pod node (affinity / anti affinity) hard policy configuration
spec: affinity: podAffinity/podAnitAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: KEY_NAME operator: In/NotIn/Exists/DoesNotExist/Gt/Lt values: - KEY_VALUE topologyKey: kubernetes.io/hostname
7.5 pod node (affinity / anti affinity) soft policy configuration
spec: affinity: podAffinity/podAnitAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: WEIGHT_VALUE podAffinityTerm: labelSelector: matchExpressions: - key: KEY_NAME operator: In/NotIn/Exists/DoesNotExist values: - KEY_VALUE topologyKey: kubernetes.io/hostname