Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ci tests for mnist example #684

Merged
merged 1 commit into from
Dec 7, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 8 additions & 122 deletions mnist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@
- [Prerequisites](#prerequisites)
- [Deploy Kubeflow](#deploy-kubeflow)
- [Local Setup](#local-setup)
- [GCP Setup](#gcp-setup)
- [Modifying existing examples](#modifying-existing-examples)
- [Prepare model](#prepare-model)
- [Build and push model image.](#build-and-push-model-image)
- [(Optional) Build and push model image.](#optional-build-and-push-model-image)
- [Preparing your Kubernetes Cluster](#preparing-your-kubernetes-cluster)
- [Training your model](#training-your-model)
- [Local storage](#local-storage)
Expand Down Expand Up @@ -53,6 +54,9 @@ You also need the following command line tools:

**Note:** kustomize [v2.0.3](https://github.com/kubernetes-sigs/kustomize/releases/tag/v2.0.3) is recommented since the [problem](https://github.com/kubernetes-sigs/kustomize/issues/1295) in kustomize v2.1.0.

### GCP Setup

If you are using GCP, need to enable [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) to execute below steps.

## Modifying existing examples

Expand All @@ -70,9 +74,9 @@ Basically, we must:

The resulting model is [model.py](model.py).

### Build and push model image.
### (Optional) Build and push model image.

With our code ready, we will now build/push the docker image.
With our code ready, we will now build/push the docker image, or use the existing image `gcr.io/kubeflow-ci/mnist/model:latest` without building and pushing.

```
DOCKER_URL=docker.io/reponame/mytfmodel:tag # Put your docker registry here
Expand Down Expand Up @@ -200,7 +204,7 @@ kustomize edit add configmap mnist-map-training --from-literal=name=mnist-train-
Optionally, if you want to use your custom training image, configurate that as below.

```
kustomize edit set image training-image=$DOCKER_URL:$TAG
kustomize edit set image training-image=$DOCKER_URL
```

Next we configure it to run distributed by setting the number of parameter servers and workers to use. The `numPs` means the number of Ps and the `numWorkers` means the number of Worker.
Expand All @@ -225,94 +229,6 @@ kustomize edit add configmap mnist-map-training --from-literal=modelDir=gs://${B
kustomize edit add configmap mnist-map-training --from-literal=exportDir=gs://${BUCKET}/${MODEL_PATH}/export
```

In order to write to GCS we need to supply the TFJob with GCP credentials. We do
this by telling our training code to use a [Google service account](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually).

If you followed the [getting started guide for GKE](https://www.kubeflow.org/docs/started/getting-started-gke/)
then a number of steps have already been performed for you

1. We created a Google service account named `${DEPLOYMENT}-user`

* You can run the following command to list all service accounts in your project

```
gcloud --project=${PROJECT} iam service-accounts list
```
2. We stored the private key for this account in a K8s secret named `user-gcp-sa`
* To see the secrets in your cluster
```
kubectl get secrets
```
3. We granted this service account permission to read/write GCS buckets in this project
* To see the IAM policy you can do
```
gcloud projects get-iam-policy ${PROJECT} --format=yaml
```
* The output should look like the following
```
bindings:
...
- members:
- serviceAccount:${DEPLOYMENT}-user@${PROJEC}.iam.gserviceaccount.com
...
role: roles/storage.admin
...
etag: BwV_BqSmSCY=
version: 1
```
To use this service account we perform the following steps
1. Mount the secret `user-gcp-sa` into the pod and configure the mount path of the secret.
```
kustomize edit add configmap mnist-map-training --from-literal=secretName=user-gcp-sa
kustomize edit add configmap mnist-map-training --from-literal=secretMountPath=/var/secrets
```
* Note: ensure your envrionment is pointed at the same `kubeflow` namespace as the `user-gcp-sa` secret
2. Next we need to set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` so that our code knows where to look for the service account key.
```
kustomize edit add configmap mnist-map-training --from-literal=GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/user-gcp-sa.json
```
* If we look at the spec for our job we can see that the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set.
```
kustomize build .
```
```
apiVersion: kubeflow.org/v1beta2
kind: TFJob
metadata:
...
spec:
tfReplicaSpecs:
Chief:
replicas: 1
template:
spec:
containers:
- command:
..
env:
...
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/user-gcp-sa.json
...
...
...
```

You can now submit the job

Expand Down Expand Up @@ -519,36 +435,6 @@ Assuming you followed the directions above if you used GCS you can use the follo
LOGDIR=gs://${BUCKET}/${MODEL_PATH}
```
You need to point TensorBoard to GCP credentials to access GCS bucket with model.
1. Mount the secret `user-gcp-sa` into the pod and configure the mount path of the secret.
```
kustomize edit add configmap mnist-map-monitoring --from-literal=secretName=user-gcp-sa
kustomize edit add configmap mnist-map-monitoring --from-literal=secretMountPath=/var/secrets
```
* Setting this parameter causes a volumeMount and volume to be added to TensorBoard deployment
2. Next we need to set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` so that our code knows
where to look for the service account key.
```
kustomize edit add configmap mnist-map-monitoring --from-literal=GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/user-gcp-sa.json
```
* If we look at the spec for TensorBoard deployment we can see that the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set.
```
kustomize build .
```
```
...
env:
...
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/user-gcp-sa.json
```
#### Using S3
Enter the `monitoring/S3` from the `mnist` application directory.
Expand Down
17 changes: 0 additions & 17 deletions mnist/monitoring/GCS/deployment_patch.yaml

This file was deleted.

31 changes: 0 additions & 31 deletions mnist/monitoring/GCS/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,3 @@ bases:

configurations:
- params.yaml

vars:
- fieldref:
fieldPath: data.GOOGLE_APPLICATION_CREDENTIALS
name: GOOGLE_APPLICATION_CREDENTIALS
objref:
apiVersion: v1
kind: ConfigMap
name: mnist-map-monitoring
- fieldref:
fieldPath: data.secretName
name: secretName
objref:
apiVersion: v1
kind: ConfigMap
name: mnist-map-monitoring
- fieldref:
fieldPath: data.secretMountPath
name: secretMountPath
objref:
apiVersion: v1
kind: ConfigMap
name: mnist-map-monitoring

patchesJson6902:
- path: deployment_patch.yaml
target:
group: apps
kind: Deployment
name: tensorboard-tb
version: v1beta1
5 changes: 0 additions & 5 deletions mnist/monitoring/GCS/params.yaml

This file was deleted.

3 changes: 3 additions & 0 deletions mnist/monitoring/base/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,15 @@ spec:
replicas: 1
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
app: tensorboard
tb-job: tensorboard
name: tensorboard
namespace: kubeflow
spec:
serviceAccount: default-editor
containers:
- command:
- /usr/local/bin/tensorboard
Expand Down
17 changes: 0 additions & 17 deletions mnist/serving/GCS/deployment_patch.yaml

This file was deleted.

8 changes: 0 additions & 8 deletions mnist/serving/GCS/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,3 @@ kind: Kustomization

bases:
- ../base

patchesJson6902:
- path: deployment_patch.yaml
target:
group: extensions
kind: Deployment
name: $(svcName)
version: v1beta1
3 changes: 3 additions & 0 deletions mnist/serving/base/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,13 @@ metadata:
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
app: mnist
version: v1
spec:
serviceAccount: default-editor
containers:
- args:
- --port=9000
Expand Down
Loading