Skip to content

Commit

Permalink
Fixed typos and some clarification on playbook doc (kubeflow#381)
Browse files Browse the repository at this point in the history
* fixed typos and some clarification

* fix typos

* fix

* formating

* fixed typo

* added some comments on testing pv

* fix a typo
  • Loading branch information
zabbasi authored and k8s-ci-robot committed May 6, 2019
1 parent 8955643 commit c9c8f77
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 6 deletions.
40 changes: 35 additions & 5 deletions playbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ This is a playbook for build cops to help deal with problems with the CI infrast
* So if there are resources older than O(2) hours it indicates that there is a problem with
garbage collection

1. To access to k8s resources make sure to get credentials and set the default namespace to `kubeflow-test-infra`:

```
gcloud container clusters get-credentials kubeflow-testing --zone $ZONE --project kubelow-ci
kubectl config set-context $(kubectl config current-context) --namespace=kubeflow-test-infra
```

1. Check if the cron job to GC resources is running in the test cluster

```
Expand Down Expand Up @@ -84,7 +91,7 @@ This is a playbook for build cops to help deal with problems with the CI infrast
1. Delete the nfs deployment
```
gcloud --project=kubeflow-ci deployment-manager delete kubeflow-ci-nfs
gcloud --project=kubeflow-ci deployment-manager deployments delete kubeflow-ci-nfs
```
1. Delete the PV and pvc
Expand All @@ -102,7 +109,7 @@ This is a playbook for build cops to help deal with problems with the CI infrast
* Dump the yaml
```
kubectl get pvc nfs-external -o yaml > /tmp/nfs-external.yaml
kubectl get pvc nfs-external -o yaml > /tmp/nfs-external.yaml
```
* Delete the finalizer `kubernetes.io/pvc-protection` in nfs-external.yaml
Expand All @@ -121,8 +128,18 @@ This is a playbook for build cops to help deal with problems with the CI infrast
```
kubectl apply -f /tmp/nfs-external.yaml
```
* Alternatively you can use `kubectl edit` to remove finalizers.
* Similarly, make sure you remove finzlizers from pv (i.e., gcfs)
1. If pv/pvc deleteion still stalls, delete all pods in `kubeflow-test-infra` manually
```
kubectl delete pods --all
```
1. Rereate the NFS share
1. Recreate the NFS share
```
cd test-infra/gcp_configs
Expand All @@ -139,7 +156,7 @@ This is a playbook for build cops to help deal with problems with the CI infrast
```
cd test-infra/ks_app
ks param set --env=kubeflow-ci nfs-external nfsServer 10.240.92.210
ks param set --env=kubeflow-ci nfs-external nfsServer <NFS-IP-address>
```
1. Recreate the PV and PVC
Expand All @@ -148,4 +165,17 @@ This is a playbook for build cops to help deal with problems with the CI infrast
ks apply kubeflow-ci -c nfs-external
```
1. Make sure the `debug-worker-0` pod is able to successfully mount the PV
1. Make sure the `debug-worker-0` pod is able to successfully mount the PV
* If you already deleted the pod `debug-worker-0` make sure it is restarted and is healthy. Otherwise, if it stalls in terminated state, force delete it as follows:
```
kubectl delete pods debug-worker-0 --grace-period=0 --force
```
* Connect to `debug-worker-0` to make sure it is able to mount the PV
```
kubectl exec -it debug-worker-0 /bin/bash
ls /secret
```
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ params {
exposeUi: true,
},
"nfs-external"+: {
nfsServer: '10.240.92.210',
nfsServer: '10.10.224.162',
},
},
}

0 comments on commit c9c8f77

Please sign in to comment.