Kubeflow series: Solve one-click installation of kubeflow based on Ali cloud mirror in China

Environmental preparation

kubeflow is very demanding for the environment. See the official requirements:
at least one worker node with a minimum of:

  • 4 CPU
  • 50 GB storage
  • 12 GB memory

Of course, you can install if you don't, but you will have resource problems later on because this is the full package.

A kubernetes cluster is already installed, and here I'm using a cluster installed by rancher.

sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher

Here I choose version 1.14 of k8s, which is compatible between kubeflow and k8s. Official description Here, my kubeflow is version 0.6.

You can also create Aliyun kubernetes directly (remember to choose version 1.14):

If you want to install it directly, you can adjust it to One-click installation of kubeflow

kustomize

Download the kustomize file

The official tutorials are kfclt Installed, kfclt is essentially installed using kustomize, so here I download the kustomize file directly and install it by modifying the mirror.

Official kustomize file Download Address

git clone https://github.com/kubeflow/manifests
cd manifests
git checkout v0.6-branch
cd <target>/base
kubectl kustomize . | tee <output file>

There are many files, which can be exported separately by script or generated by kfctl generate all -V using the kfctl command:

kustomize/
├── ambassador.yaml
├── api-service.yaml
├── argo.yaml
├── centraldashboard.yaml
├── jupyter-web-app.yaml
├── katib.yaml
├── metacontroller.yaml
├── minio.yaml
├── mysql.yaml
├── notebook-controller.yaml
├── persistent-agent.yaml
├── pipelines-runner.yaml
├── pipelines-ui.yaml
├── pipelines-viewer.yaml
├── pytorch-operator.yaml
├── scheduledworkflow.yaml
├── tensorboard.yaml
└── tf-job-operator.yaml

ambassador Micro Service Gateway
argo for task workflow organization
Dashboard Kanban page for central dashboard kubeflow
tf-job-operator Deep Learning Framework Engine, a CRD built on tensorflow. Resource type kind is TFJob
katib superparametric server

Process of using machine learning kit

Modify the kustomize file

Modify the kustomize image

Modify Mirror:

grc_image = [
"gcr.io/kubeflow-images-public/ingress-setup:latest",
"gcr.io/kubeflow-images-public/admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c",
"gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta",
"gcr.io/kubeflow-images-public/centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59",
"gcr.io/kubeflow-images-public/jupyter-web-app:9419d4d",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-controller:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager-rest:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-bayesianoptimization:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-grid:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-hyperband:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-nasrl:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-random:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-ui:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/metadata:v0.1.8",
"gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8",
"gcr.io/ml-pipeline/api-server:0.1.23",
"gcr.io/ml-pipeline/persistenceagent:0.1.23",
"gcr.io/ml-pipeline/scheduledworkflow:0.1.23",
"gcr.io/ml-pipeline/frontend:0.1.23",
"gcr.io/ml-pipeline/viewer-crd-controller:0.1.23",
"gcr.io/kubeflow-images-public/notebook-controller:v20190603-v0-175-geeca4530-e3b0c4",
"gcr.io/kubeflow-images-public/profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e",
"gcr.io/kubeflow-images-public/kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4",
"gcr.io/kubeflow-images-public/pytorch-operator:v1.0.0-rc.0",
"gcr.io/google_containers/spartakus-amd64:v1.1.0",
"gcr.io/kubeflow-images-public/tf_operator:v0.6.0.rc0",
"gcr.io/arrikto/kubeflow/oidc-authservice:v0.2"
]

doc_image = [
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.ingress-setup:latest",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kubernetes-sigs.application:1.0-beta",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.jupyter-web-app:9419d4d",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-controller:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager-rest:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-bayesianoptimization:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-grid:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-hyperband:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-nasrl:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-random:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-ui:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata:v0.1.8",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata-frontend:v0.1.8",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.api-server:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.persistenceagent:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.scheduledworkflow:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.frontend:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.viewer-crd-controller:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.notebook-controller:v20190603-v0-175-geeca4530-e3b0c4",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.pytorch-operator:v1.0.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/google_containers.spartakus-amd64:v1.1.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.tf_operator:v0.6.0.rc0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/arrikto.kubeflow.oidc-authservice:v0.2"
]

Modify PVC to use dynamic storage

Modify pvc storage using local-path-provisioner Dynamic allocation of PV.

Install local-path-provisioner:

kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

If you want to use it directly in kubeflow, you also need to change the StorageClass to the default storage:

...
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-path
  annotations: #Add as Default StorageClass
    storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
...

When finished, you can try building a PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-path-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

Note: If you do not set the default storageclass, you need to bind PVC with storageClassName: local-path

One-click Installation

Here I made a one-click launch of the National Endoscopic kubeflow project:
https://github.com/shikanon/kubeflow-manifests

Tags: Operation & Maintenance Kubernetes jupyter git github

Posted on Sat, 04 Jan 2020 14:53:15 -0500 by er0x