Use vk virtual nodes to increase k8s cluster capacity and resilience
The way of adding vk virtual node in kubernetes cluster has been widely used by many customers. The virtual node based on vk can greatly improve the capacity and flexibility of the cluster, create ECI Pod flexibly and dynamically on demand, and avoid the trouble of cluster capacity planning.At present, vk virtual nodes are widely used in the following scenarios.
- Peak and valley elastic demand of online business: For example, online education, e-commerce and other industries have clear peak and valley computing characteristics, using vk can significantly reduce the maintenance of fixed resource pools and reduce computing costs.
- Promote cluster Pod capacity: When traditional flannel network mode clusters are unable to add more nodes due to vpc routing table entries or vswitch network planning restrictions, using virtual nodes can avoid the above problems and simply and quickly increase cluster Pod capacity.
- Data Computing: Use vk to host computing scenarios such as Spark, Presto, etc. to effectively reduce computing costs.
- CI/CD and other Job-type tasks
Create multiple vk virtual nodes
Deploy virtual nodes with reference to ACK product documentation: https://help.aliyun.com/document_detail/118970.html
Generally speaking, if the number of ECI pods in a single k8s cluster is less than 3000, we recommend deploying a single vk node.For scenarios where you want vk to carry more pods, we recommend deploying multiple vk nodes in the k8s cluster to expand vk horizontally. The deployment pattern of multiple vk nodes can alleviate the pressure of a single vk node and support a larger eci pod capacity.For example, 3 vk nodes can support 33000 ECI pods and 10 vk nodes can support 103000 ECI pods.
For simpler vk level extensions, we deploy vk controllers using statefulset, where each vk controller manages a vk node.When more vk virtual nodes are needed, simply modify the replicas of the statefulset.Configure and deploy the following statefulset yaml file (configuring AK and information such as vpc/vswitch/security group) with a default number of 1 copies of statefulset.
apiVersion: apps/v1 kind: StatefulSet metadata: labels: app: virtual-kubelet name: virtual-kubelet namespace: kube-system spec: replicas: 1 selector: matchLabels: app: virtual-kubelet serviceName: "" template: metadata: labels: app: virtual-kubelet spec: containers: - args: - --provider - alibabacloud - --nodename - $(VK_INSTANCE) env: - name: VK_INSTANCE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: KUBELET_PORT value: "10250" - name: VKUBELET_POD_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: VKUBELET_TAINT_KEY value: "virtual-kubelet.io/provider" - name: VKUBELET_TAINT_VALUE value: "alibabacloud" - name: VKUBELET_TAINT_EFFECT value: "NoSchedule" - name: ECI_REGION value: xxx - name: ECI_VPC value: vpc-xxx - name: ECI_VSWITCH value: vsw-xxx - name: ECI_SECURITY_GROUP value: sg-xxx - name: ECI_QUOTA_CPU value: "1000000" - name: ECI_QUOTA_MEMORY value: 6400Ti - name: ECI_QUOTA_POD value: "3000" - name: ECI_ACCESS_KEY value: xxx - name: ECI_SECRET_KEY value: xxx - name: ALIYUN_CLUSTERID value: xxx image: registry.cn-hangzhou.aliyuncs.com/acs/virtual-nodes-eci:v1.0.0.2-aliyun imagePullPolicy: Always name: ack-virtual-kubelet dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler serviceAccount: ack-virtual-node-controller serviceAccountName: ack-virtual-node-controller
Modify the number of pod copies of statefulset to add more vk nodes.
# kubectl -n kube-system scale statefulset virtual-kubelet --replicas=3 statefulset.apps/virtual-kubelet scaled # kubectl get no NAME STATUS ROLES AGE VERSION cn-hangzhou.192.168.1.1 Ready <none> 63d v1.12.6-aliyun.1 cn-hangzhou.192.168.1.2 Ready <none> 63d v1.12.6-aliyun.1 virtual-kubelet-0 Ready agent 1m v1.11.2-aliyun-1.0.207 virtual-kubelet-1 Ready agent 1m v1.11.2-aliyun-1.0.207 virtual-kubelet-2 Ready agent 1m v1.11.2-aliyun-1.0.207 # kubectl -n kube-system get statefulset virtual-kubelet NAME READY AGE virtual-kubelet 3/3 3m # kubectl -n kube-system get pod|grep virtual-kubelet virtual-kubelet-0 1/1 Running 0 15m virtual-kubelet-1 1/1 Running 0 11m virtual-kubelet-2 1/1 Running 0
When we create multiple nginx pod s in the vk namespace, we can see that pods are dispatched to multiple vk nodes.
# kubectl create ns vk # kubectl label namespace vk virtual-node-affinity-injection=enabled # kubectl -n vk run nginx --image nginx:alpine --replicas=10 deployment.extensions/nginx scaled # kubectl -n vk get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-544b559c9b-4vgzx 1/1 Running 0 1m 192.168.165.198 virtual-kubelet-2 <none> <none> nginx-544b559c9b-544tm 1/1 Running 0 1m 192.168.125.10 virtual-kubelet-0 <none> <none> nginx-544b559c9b-9q7v5 1/1 Running 0 1m 192.168.165.200 virtual-kubelet-1 <none> <none> nginx-544b559c9b-llqmq 1/1 Running 0 1m 192.168.165.199 virtual-kubelet-2 <none> <none> nginx-544b559c9b-p6c5g 1/1 Running 0 1m 192.168.165.197 virtual-kubelet-0 <none> <none> nginx-544b559c9b-q8mpt 1/1 Running 0 1m 192.168.165.196 virtual-kubelet-0 <none> <none> nginx-544b559c9b-rf5sq 1/1 Running 0 1m 192.168.125.8 virtual-kubelet-0 <none> <none> nginx-544b559c9b-s64kc 1/1 Running 0 1m 192.168.125.11 virtual-kubelet-2 <none> <none> nginx-544b559c9b-vfv56 1/1 Running 0 1m 192.168.165.201 virtual-kubelet-1 <none> <none> nginx-544b559c9b-wfb2z 1/1 Running 0 1m 192.168.125.9 virtual-kubelet-1 <none> <none>
Reduce the number of vk virtual nodes
Because the eci pod on the vk is created on demand, vk virtual nodes do not occupy the actual resources when there is no eci pod, so in general we do not need to reduce the number of vk nodes.However, if users really want to reduce the number of vk nodes, we recommend the following steps.
Suppose there are five vk nodes in the current cluster, virtual-kubelet-0/.../virtual-kubelet-4.We want to reduce to one vk node, so we need to delete the four virtual-kubelet-1/. / virtual-kubelet-4 nodes.
- Gracefully offline vk nodes first, expelling the above pods to other nodes, but also prohibiting more pods from dispatching to the vk nodes to be deleted.
# kubectl drain virtual-kubelet-1 virtual-kubelet-2 virtual-kubelet-3 virtual-kubelet-4 # kubectl get no NAME STATUS ROLES AGE VERSION cn-hangzhou.192.168.1.1 Ready <none> 66d v1.12.6-aliyun.1 cn-hangzhou.192.168.1.2 Ready <none> 66d v1.12.6-aliyun.1 virtual-kubelet-0 Ready agent 3d6h v1.11.2-aliyun-1.0.207 virtual-kubelet-1 Ready,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207 virtual-kubelet-2 Ready,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207 virtual-kubelet-3 Ready,SchedulingDisabled agent 66m v1.11.2-aliyun-1.0.207 virtual-kubelet-4 Ready,SchedulingDisabled agent 66m v1.11.2-aliyun-1.0.207
The reason why elegant offline vk nodes need to be taken first is that ECI pods on vk nodes are managed by vk controllers. Deleting vk controllers when ECI pods still exist on vk nodes will result in residual ECI pods, and vk controllers cannot continue to manage those pods.
- After the vk nodes are offline, modify the number of copies of the virtual-kubelet statefulset to reduce it to the number of vk nodes we expect.
# kubectl -n kube-system scale statefulset virtual-kubelet --replicas=1 statefulset.apps/virtual-kubelet scaled # kubectl -n kube-system get pod|grep virtual-kubelet virtual-kubelet-0 1/1 Running 0 3d6h
Wait a while and we will see that those vk nodes become NotReady.
# kubectl get no NAME STATUS ROLES AGE VERSION cn-hangzhou.192.168.1.1 Ready <none> 66d v1.12.6-aliyun.1 cn-hangzhou.192.168.1.2 Ready <none> 66d v1.12.6-aliyun.1 virtual-kubelet-0 Ready agent 3d6h v1.11.2-aliyun-1.0.207 virtual-kubelet-1 NotReady,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207 virtual-kubelet-2 NotReady,SchedulingDisabled agent 3d6h v1.11.2-aliyun-1.0.207 virtual-kubelet-3 NotReady,SchedulingDisabled agent 70m v1.11.2-aliyun-1.0.207 virtual-kubelet-4 NotReady,SchedulingDisabled agent 70m v1.11.2-aliyun-1.0.207
- Manually delete vk nodes in NotReady state
# kubelet delete no virtual-kubelet-1 virtual-kubelet-2 virtual-kubelet-3 virtual-kubelet-4 node "virtual-kubelet-1" deleted node "virtual-kubelet-2" deleted node "virtual-kubelet-3" deleted node "virtual-kubelet-4" deleted # kubectl get no NAME STATUS ROLES AGE VERSION cn-hangzhou.192.168.1.1 Ready <none> 66d v1.12.6-aliyun.1 cn-hangzhou.192.168.1.2 Ready <none> 66d v1.12.6-aliyun.1 virtual-kubelet-0 Ready agent 3d6h v1.11.2-aliyun-1.0.207