KubeSphere troubleshooting practice

Pick up another article. KubeSphere practical elimination Recently, some usage problems have been recorded in the use of kubesphere, hoping to be helpful to others

Eight kubesphere application upload problems

8.1 file upload 413

The application is deployed into the kubesphere. The function of uploading files is set in the application. When the last exception is tested, the file cannot be uploaded normally. The error is reported in the ingress 413. The controller used in the kubesphere is the ingress nginx controller, and k-v can be added to the annotation to support it,

Solution: apply route custom max body size

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-max-body-size

8.2 large file upload backend 504

504 solution for large file upload backend response:

proxy read timeoutnginx.ingress.kubernetes.io/proxy-read-timeout

9. Cross regional issues

kubesphere uses ingress nginx to support cross domain, which can be added in the annotation by referring to the following link

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#enable-cors

The test environment can use hosts to resolve the domain name to the local, the front end uses nginx to do static file service, and the reverse proxy back-end api. Please refer to the example:

server {
  listen 80;
  server_name localhost;
  # Force https jump
  # rewrite ^(.*)$ https://$host$1 permanent;
  location / {
    index      index.html;
    root       /smart-frontend;
    try_files $uri $uri/ /index.html;
    client_body_buffer_size 200m;
    charset utf-8;
  }
  location /api {
      proxy_pass http://smart-backend:8080/api;
      proxy_read_timeout 1200;
      client_max_body_size 1024m;
  }
      gzip  on; #Open gzip
      gzip_vary on;
      gzip_min_length 1k; #Do not compress the critical value. Only when the value is more than 1k, it will be compressed. Generally, it does not need to be changed
      gzip_buffers 4 16k;
      gzip_comp_level 6; #Compression level, the larger the number, the better the compression
      gzip_types  text/plain application/javascript application/x-javascript text/css application/xml text/javascript application/x-httpd-php image/jpeg image/gif image/png image/x-icon;
}

X add node

In the later stage, when the business comes up gradually, the cluster node resources are insufficient, new node nodes are added, and data disks of node nodes are added to ceph nodes

10.1 ceph cluster add node

  • system configuration

  • Free key configuration
  • hosts configuration
  • docker installation and migration to data disk
  • cgroup enabled
  • ceph data node add

ceph cluster configuration adds the data disk node of node03 cluster (if the data storage class is enough, you can not add the data node)

[root@node03 docker]# mkfs.xfs /dev/vdd
[root@node03 docker]# mkdir -p /var/local/osd3
[root@node03 docker]# mount /dev/vdd /var/local/osd3/

//Add vdd to / etc/fstab
[root@node03 docker]# yum -y install yum-plugin-priorities epel-release

[root@node03 yum.repos.d]# chmod 777 -R /var/local/osd3/
[root@node03 yum.repos.d]# chmod 777 -R /var/local/osd3/*  master Node utilization ceph-deploy deploy node03 node[root@master ceph]# ceph-deploy install node03
[root@master ceph]# ceph-deploy  gatherkeys master
[root@master ceph]# ceph-deploy osd prepare node03:/var/local/osd3
  • Activate osd
[root@master ceph]# ceph-deploy osd activate node03:/var/local/osd3
  • View state
[root@master ceph]# ceph-deploy osd list master node01 node02 node03
  • Copy key
[root@master ceph]# ceph-deploy admin master node01 node02 node03
  • Set permissions in node03 node
[root@node03 yum.repos.d]# chmod +r /etc/ceph/ceph.client.admin.keyring
  • Set MDS in master
[root@master ceph]# ceph-deploy mds create node01 node02 node03
  • View state
[root@master ceph]# ceph health
[root@master ceph]# ceph - Because it's new node Nodes and data need to be balanced and backfilled. Now check the cluster status[root@master conf]# ceph -s
    cluster 5b9eb8d2-1c12-4f6d-ae9c-85078795794b
     health HEALTH_ERR
            44 pgs backfill_wait
            1 pgs backfilling
            1 pgs inconsistent
            45 pgs stuck unclean
            recovery 1/55692 objects degraded (0.002%)
            recovery 9756/55692 objects misplaced (17.518%)
            2 scrub errors
     monmap e1: 1 mons at {master=172.16.60.2:6789/0}
            election epoch 35, quorum 0 master
     osdmap e2234: 4 osds: 4 up, 4 in; 45 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v5721471: 192 pgs, 2 pools, 104 GB data, 27846 objects
            230 GB used, 1768 GB / 1999 GB avail
            1/55692 objects degraded (0.002%)
            9756/55692 objects misplaced (17.518%)
                 146 active+clean
                  44 active+remapped+wait_backfill
                   1 active+remapped+backfilling
                   1 active+clean+inconsistent
recovery io 50492 kB/s, 13 objects/s
  client io 20315 B/s wr, 0 op/s rd, 5 op/s wr 
  • The final problem is that the new ceph data node needs data synchronization due to the new node
[root@master conf]# ceph -s
    cluster 5b9eb8d2-1c12-4f6d-ae9c-85078795794b
     health HEALTH_ERR
            1 pgs inconsistent
            2 scrub errors
     monmap e1: 1 mons at {master=172.16.60.2:6789/0}
            election epoch 35, quorum 0 master
     osdmap e2324: 4 osds: 4 up, 4 in
            flags sortbitwise,require_jewel_osds
      pgmap v5723479: 192 pgs, 2 pools, 104 GB data, 27848 objects
            229 GB used, 1769 GB / 1999 GB avail
                 191 active+clean
                   1 active+clean+inconsistent
  client io 78305 B/s wr, 0 op/s rd, 18 op/s wr repair[root@master conf]# ceph -s
    cluster 5b9eb8d2-1c12-4f6d-ae9c-85078795794b
     health HEALTH_OK
     monmap e1: 1 mons at {master=172.16.60.2:6789/0}
            election epoch 35, quorum 0 master
     osdmap e2324: 4 osds: 4 up, 4 in
            flags sortbitwise,require_jewel_osds
      pgmap v5724320: 192 pgs, 2 pools, 104 GB data, 27848 objects
            229 GB used, 1769 GB / 1999 GB avail
                 192 active+clean
  client io 227 kB/s wr, 0 op/s rd, 7 op/s wr
# Synchronous completion
[root@master conf]# ceph health
HEALTH_OK

10.2 node node addition

kubesphere provides convenient steps for adding new nodes https://kubesphere.com.cn/docs/v2.1/zh-CN/installation/add-nodes/

Modify host.ini

[all]
master ansible_connection=local  ip=172.16.60.2
node01  ansible_host=172.16.60.3  ip=172.16.60.3 
node02  ansible_host=172.16.60.4  ip=172.16.60.4
node03  ansible_host=172.16.60.5  ip=172.16.60.5
[kube-master]
master           
[kube-node]
master
node01   
node02
node03

Execute the add-nodes.sh script in the "/ script" directory. After the expansion script is executed successfully, you can see the cluster node information including the new node. You can select infrastructure through the menu of the KubeSphere console and enter the host management page to view, or you can execute the kubectl get node command through the Kubectl tool to view the expanded cluster node details.

[root@master scripts]# ./add-nodes.sh

View validation

[root@master conf]# kubectl get nodes -owide
NAME     STATUS   ROLES         AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
master   Ready    master        136d   v1.15.5   172.16.60.2   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.2
node01   Ready    node,worker   136d   v1.15.5   172.16.60.3   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.2
node02   Ready    node,worker   136d   v1.15.5   172.16.60.4   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.2
node03   Ready    worker        10m    v1.15.5   172.16.60.5   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://19.3.5
[root@master conf]# kubectl label node  node-role.kubernetes.io/node=
common.yaml            hosts.ini              plugin-qingcloud.yaml  
[root@master conf]# kubectl label node node03  node-role.kubernetes.io/node=   
node/node03 labeled
[root@master conf]# kubectl get nodes -owide                                
NAME     STATUS   ROLES         AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
master   Ready    master        136d   v1.15.5   172.16.60.2   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.2
node01   Ready    node,worker   136d   v1.15.5   172.16.60.3   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.2
node02   Ready    node,worker   136d   v1.15.5   172.16.60.4   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.2
node03   Ready    node,worker   11m    v1.15.5   172.16.60.5   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://19.3.5
[root@master conf]# 

Eleven K8s cluster resources are uneven

It can be found that the use of k8s resources is not balanced. The previous deployment application was to develop nodeSelect, which caused some system services to run in the node node. It is found that the memory consumption of node2 is very large, resulting in abnormal alarm or restart of the cluster

You can view

kubectl get pods -o wide --all-namespaces |grep node02 |awk '{print $1,  $2}'

Some system applications are scheduled to the master node through nodeselect to reduce the memory pressure of node2 node.

`kubectl  get nodes --show-labels`

View the system components on node2 and add nodeselector to reschedule

      nodeSelector:
        node-role.kubernetes.io/master: master

View the existing kubesphere system deployment on node2

After scheduling, check whether the memory load of node2 is down

Twelve kubesphere devops project

Node03 node has been added. One week for the devops project, the instance running the job in the queue is not initialized. Log in to the cluster to check. The base pod on node03 is mirrored in the pull agent. In order to be fast, directly on the node node, save the base image and then load it on node03

[root@master ~]# kubectl describe pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system |grep -E "^base" |awk '{print $1}')

XIII kubesphere application installation

At present, my own kubesphere cluster is 2.1. After adding repo in the project, I need to add several Helms for the background to synchronize the mirror data manually It seems that the chart in repo is not displayed in the web interface. Under the project with repo added, I create a new app, and then select the app store from kubesphere, where there are only a few charts. I can't find the charts of the added helm source, and I can use the command search to search in the server. For the moment, the consulting community received a reply. Remember that there is a task in the background of v2.0 to synchronize charts. At present, in version 2.1, use the helm command to manually install helm in the cluster

[root@master common-service]# helm install -n consul --namespace common-service -f consul/values-production.yaml consul/
NAME:   consul
LAST DEPLOYED: Tue Jan 14 17:56:27 2020
NAMESPACE: common-service
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME      READY  STATUS   RESTARTS  AGE
consul-0  0/2    Pending  0         0s

==> v1/Service
NAME       TYPE       CLUSTER-IP   EXTERNAL-IP  PORT(S)                                                AGE
consul     ClusterIP  None         <none>       8400/TCP,8301/TCP,8301/UDP,8300/TCP,8600/TCP,8600/UDP  1s
consul-ui  ClusterIP  10.233.59.7  <none>       80/TCP                                                 1s
==> v1/StatefulSet
NAME    READY  AGE
consul  0/3    0s

==> v1beta1/PodDisruptionBudget
NAME        MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS  AGE
consul-pdb  1              N/A              0                    1s

NOTES:
  ** Please be patient while the chart is being deployed **

  Consul can be accessed within the cluster on port 8300 at consul.common-service.svc.cluster.local

In order to access to the Consul Web UI:

    kubectl port-forward --namespace common-service svc/consul-ui 80:80
    echo "Consul URL: http://127.0.0.1:80"

Please take into account that you need to wait until a cluster leader is elected before using the Consul Web UI.

In order to check the status of the cluster you can run the following command:

    kubectl exec -it consul-0 -- consul members

Furthermore, to know which Consul node is the cluster leader run this other command:

    kubectl exec -it consul-0 -- consul operator raf

For specific questions, please refer to the post: https://kubesphere.com.cn/forum/d/669-kubesphere

I have sorted out the k8s learning notes by myself. If there is a rise, I can learn and communicate quickly: https://github.com/redhatxl/awesome-kubernetes-notes
Support KubeSphere, a domestic container management platform, and make a contribution to the community.

Tags: Linux Ceph Docker CentOS

Posted on Tue, 24 Mar 2020 06:12:27 -0400 by Kalland