Environmental preparation
kubeflow is very demanding for the environment. See the official requirements:
at least one worker node with a minimum of:
- 4 CPU
- 50 GB storage
- 12 GB memory
Of course, you can install if you don't, but you will have resource problems later on because this is the full package.
A kubernetes cluster is already installed, and here I'm using a cluster installed by rancher.
sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher
Here I choose version 1.14 of k8s, which is compatible between kubeflow and k8s. Official description Here, my kubeflow is version 0.6.
You can also create Aliyun kubernetes directly (remember to choose version 1.14):
If you want to install it directly, you can adjust it to One-click installation of kubeflow
kustomize
Download the kustomize fileThe official tutorials are kfclt Installed, kfclt is essentially installed using kustomize, so here I download the kustomize file directly and install it by modifying the mirror.
Official kustomize file Download Address
git clone https://github.com/kubeflow/manifests cd manifests git checkout v0.6-branch cd <target>/base kubectl kustomize . | tee <output file>
There are many files, which can be exported separately by script or generated by kfctl generate all -V using the kfctl command:
kustomize/ ├── ambassador.yaml ├── api-service.yaml ├── argo.yaml ├── centraldashboard.yaml ├── jupyter-web-app.yaml ├── katib.yaml ├── metacontroller.yaml ├── minio.yaml ├── mysql.yaml ├── notebook-controller.yaml ├── persistent-agent.yaml ├── pipelines-runner.yaml ├── pipelines-ui.yaml ├── pipelines-viewer.yaml ├── pytorch-operator.yaml ├── scheduledworkflow.yaml ├── tensorboard.yaml └── tf-job-operator.yaml
ambassador Micro Service Gateway
argo for task workflow organization
Dashboard Kanban page for central dashboard kubeflow
tf-job-operator Deep Learning Framework Engine, a CRD built on tensorflow. Resource type kind is TFJob
katib superparametric server
Process of using machine learning kit
Modify the kustomize file
Modify the kustomize imageModify Mirror:
grc_image = [ "gcr.io/kubeflow-images-public/ingress-setup:latest", "gcr.io/kubeflow-images-public/admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c", "gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta", "gcr.io/kubeflow-images-public/centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59", "gcr.io/kubeflow-images-public/jupyter-web-app:9419d4d", "gcr.io/kubeflow-images-public/katib/v1alpha2/katib-controller:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager-rest:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-bayesianoptimization:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-grid:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-hyperband:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-nasrl:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-random:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/katib/v1alpha2/katib-ui:v0.6.0-rc.0", "gcr.io/kubeflow-images-public/metadata:v0.1.8", "gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8", "gcr.io/ml-pipeline/api-server:0.1.23", "gcr.io/ml-pipeline/persistenceagent:0.1.23", "gcr.io/ml-pipeline/scheduledworkflow:0.1.23", "gcr.io/ml-pipeline/frontend:0.1.23", "gcr.io/ml-pipeline/viewer-crd-controller:0.1.23", "gcr.io/kubeflow-images-public/notebook-controller:v20190603-v0-175-geeca4530-e3b0c4", "gcr.io/kubeflow-images-public/profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e", "gcr.io/kubeflow-images-public/kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4", "gcr.io/kubeflow-images-public/pytorch-operator:v1.0.0-rc.0", "gcr.io/google_containers/spartakus-amd64:v1.1.0", "gcr.io/kubeflow-images-public/tf_operator:v0.6.0.rc0", "gcr.io/arrikto/kubeflow/oidc-authservice:v0.2" ] doc_image = [ "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.ingress-setup:latest", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kubernetes-sigs.application:1.0-beta", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.jupyter-web-app:9419d4d", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-controller:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager-rest:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-bayesianoptimization:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-grid:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-hyperband:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-nasrl:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-random:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-ui:v0.6.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata:v0.1.8", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata-frontend:v0.1.8", "registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.api-server:0.1.23", "registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.persistenceagent:0.1.23", "registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.scheduledworkflow:0.1.23", "registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.frontend:0.1.23", "registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.viewer-crd-controller:0.1.23", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.notebook-controller:v20190603-v0-175-geeca4530-e3b0c4", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.pytorch-operator:v1.0.0-rc.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/google_containers.spartakus-amd64:v1.1.0", "registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.tf_operator:v0.6.0.rc0", "registry.cn-shenzhen.aliyuncs.com/shikanon/arrikto.kubeflow.oidc-authservice:v0.2" ]Modify PVC to use dynamic storage
Modify pvc storage using local-path-provisioner Dynamic allocation of PV.
Install local-path-provisioner:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
If you want to use it directly in kubeflow, you also need to change the StorageClass to the default storage:
... apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-path annotations: #Add as Default StorageClass storageclass.beta.kubernetes.io/is-default-class: "true" provisioner: rancher.io/local-path volumeBindingMode: WaitForFirstConsumer reclaimPolicy: Delete ...
When finished, you can try building a PVC:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: local-path-pvc namespace: default spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi
Note: If you do not set the default storageclass, you need to bind PVC with storageClassName: local-path
One-click Installation
Here I made a one-click launch of the National Endoscopic kubeflow project:
https://github.com/shikanon/kubeflow-manifests