Skip to content

Commit

Permalink
Update docs for connecting to Kubeflow Pipelines from the same cluste…
Browse files Browse the repository at this point in the history
…r in multi-user mode (#2905)

* Update docs for connecting to Kubeflow Pipelines from the same cluster

* Update content/en/docs/components/pipelines/sdk/connect-api.md

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Improve in-cluster access to KFP API documentation

* Update documentation

Co-authored-by: Bart <bartlomiej.grasza@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
  • Loading branch information
3 people authored Oct 11, 2021
1 parent fad6acc commit 4c74471
Show file tree
Hide file tree
Showing 2 changed files with 136 additions and 15 deletions.
18 changes: 4 additions & 14 deletions content/en/docs/components/pipelines/multi-user.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,19 +167,9 @@ without access control:
* Artifacts, Executions, and other metadata entities in [Machine Learning Metadata (MLMD)](https://www.tensorflow.org/tfx/guide/mlmd).
* [Minio artifact storage](https://min.io/) which contains pipeline runs' input/output artifacts.

### In-cluster API request authentication
## In-cluster API request authentication

Clients can only access the Kubeflow Pipelines API from the public endpoint
that enforces authentication.
Refer to [Connect to Kubeflow Pipelines from the same cluster](/docs/components/pipelines/sdk/connect-api/#connect-to-kubeflow-pipelines-from-the-same-cluster) for details.

In-cluster direct access to the API endpoint is denied by Istio authorization
policies, because there's no secure way to authenticate in-cluster requests to
the Kubeflow Pipelines API server yet.

If you need to access the API endpoint from in-cluster workload like Jupyter
notebooks or cron tasks, current suggested workaround is to connect through
public endpoint and follow platform specific documentation to authenticate
programmatically using user credentials. For Google Cloud, you can refer to
[Connecting to Kubeflow Pipelines in a full Kubeflow deployment on Google Cloud](/docs/gke/pipelines/authentication-sdk/#connecting-to-kubeflow-pipelines-in-a-full-kubeflow-deployment).

There is work-in-progress to support this use-case, refer to [github issue #5138](https://github.com/kubeflow/pipelines/issues/5138).
Alternatively, in-cluster workloads like Jupyter notebooks or cron tasks can also access Kubeflow Pipelines API through the public endpoint. This option is platform specific and explained in
[Connect to Kubeflow Pipelines from outside your cluster](/docs/components/pipelines/sdk/connect-api/#connect-to-kubeflow-pipelines-from-outside-your-cluster).
133 changes: 132 additions & 1 deletion content/en/docs/components/pipelines/sdk/connect-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ because it requires authentication. Refer to distribution specific documentation

### Connect to Kubeflow Pipelines from the same cluster

Note, this is not supported right now for multi-user Kubeflow Pipelines, refer to [Multi-User Isolation for Pipelines -- Current Limitations](/docs/components/pipelines/multi-user/#current-limitations).
#### Non-multi-user mode

As mentioned above, the Kubeflow Pipelines API Kubernetes service is `ml-pipeline-ui`.

Expand Down Expand Up @@ -85,6 +85,137 @@ client = kfp.Client(host=f'http://ml-pipeline-ui.{namespace}:80')
print(client.list_experiments())
```

#### Multi-User mode

Note, multi-user mode technical details were put in the [How in-cluster authentication works](#how-in-cluster-authentication-works) section below.

Choose your use-case from one of the options below:

* **Access Kubeflow Pipelines from Jupyter notebook**

In order to **access Kubeflow Pipelines from Jupyter notebook**, an additional per namespace (profile) manifest is required:

```yaml
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
name: access-ml-pipeline
namespace: "<YOUR_USER_PROFILE_NAMESPACE>"
spec:
desc: Allow access to Kubeflow Pipelines
selector:
matchLabels:
access-ml-pipeline: "true"
volumes:
- name: volume-kf-pipeline-token
projected:
sources:
- serviceAccountToken:
path: token
expirationSeconds: 7200
audience: pipelines.kubeflow.org
volumeMounts:
- mountPath: /var/run/secrets/kubeflow/pipelines
name: volume-kf-pipeline-token
readOnly: true
env:
- name: KF_PIPELINES_SA_TOKEN_PATH
value: /var/run/secrets/kubeflow/pipelines/token
```
After the manifest is applied, newly created Jupyter notebook contains an additional option in the **configurations** section.
Read more about **configurations** in the [Jupyter notebook server](/docs/components/notebooks/setup/#create-a-jupyter-notebook-server-and-add-a-notebook).
Note, Kubeflow `kfp.Client` expects token either in `KF_PIPELINES_SA_TOKEN_PATH` environment variable or
mounted to `/var/run/secrets/kubeflow/pipelines/token`. Do not change these values in the manifest.
Similarly, `audience` should not be modified as well. No additional setup is required to refresh tokens.

Remember the setup has to be repeated per each namespace (profile) that should have access to Kubeflow Pipelines API from within Jupyter notebook.

* **Access Kubeflow Pipelines from within any Pod**

In this case, the configuration is almost similar to the Jupyter Notebook case described above.
The Pod manifest has to be extended with projected volume and mounted into either
`KF_PIPELINES_SA_TOKEN_PATH` or `/var/run/secrets/kubeflow/pipelines/token`.

Manifest below shows example Pod with token mounted into `/var/run/secrets/kubeflow/pipelines/token`:

```yaml
apiVersion: v1
kind: Pod
metadata:
name: access-kfp-example
namespace: my-namespace
spec:
containers:
- image: my-image:latext
name: access-kfp-example
volumeMounts:
- mountPath: /var/run/secrets/kubeflow/pipelines
name: volume-kf-pipeline-token
readOnly: true
volumes:
- name: volume-kf-pipeline-token
projected:
sources:
- serviceAccountToken:
path: token
expirationSeconds: 7200
audience: pipelines.kubeflow.org
```

##### Managing cross-namespaces access to Kubeflow Pipelines API

As already mentioned, access to Kubeflow Pipelines API requires per namespace setup.
Alternatively, you can configure the access in a single namespace and allow other
namespaces to access Kubeflow Pipelines API through it.

Note, the examples below assume that `namespace-1` is a namespace (profile) that will be granted access to Kubeflow Pipelines API
through the `namespace-2` namespace. The `namespace-2` should already be configured to access Kubeflow Pipelines API.

Cross-namespace access can be achieved in two ways:

* **With additional RBAC settings.**

This option requires that only `namespace-2` has to have `PodDefault` manifest configured.

Access is granted by giving `namespace-1:ServiceAccount/default-editor` the `ClusterRole/kubeflow-edit` in `namespace-2`:

```
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubeflow-edit-namespace-1
namespace: namespace-2
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kubeflow-edit
subjects:
- kind: ServiceAccount
name: default-editor
namespace: namespace-1
```
* **By sharing access to the other profile.**
In this scenario, access is granted by `namespace-2` adding `namespace-1` as a
[contributor](https://www.kubeflow.org/docs/components/multi-tenancy/getting-started/#managing-contributors-through-the-kubeflow-ui).
Specifically, the owner of the `namespace-2` uses Kubeflow UI "Manage contributors" page. In the "Contributors to your namespace"
textbox he adds email address associated with the `namespace-1`.
##### How Multi-User mode in-cluster authentication works
Authentication uses ServiceAccountToken
[projection](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection).
Simply put, the token is first being injected into a Pod (e.g. Jupyter notebook's).
Then Kubeflow Pipelines SDK uses this token to authorize against Kubeflow Pipelines API.
It is important to understand that `serviceAccountToken` method respects the Kubeflow Pipelines RBAC,
and does not allow access beyond what the ServiceAcount running the notebook Pod has.
More details about `PodDefault` can be found [here](https://github.com/kubeflow/kubeflow/blob/master/components/admission-webhook/README.md).
## Configure SDK client by environment variables
It's usually beneficial to configure the Kubeflow Pipelines SDK client using Kubeflow Pipelines environment variables,
Expand Down

0 comments on commit 4c74471

Please sign in to comment.