-
Notifications
You must be signed in to change notification settings - Fork 731
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Yi Chen <github@chenyicn.net>
- Loading branch information
Showing
20 changed files
with
24,290 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Patterns to ignore when building packages. | ||
# This supports shell glob matching, relative path matching, and | ||
# negation (prefixed with !). Only one pattern per line. | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*.orig | ||
*~ | ||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
.vscode/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# | ||
# Copyright 2024 The Kubeflow authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
apiVersion: v2 | ||
|
||
name: trainer | ||
|
||
description: A Helm chart for deploying Kubeflow trainer on Kubernetes. | ||
|
||
version: 2.0.0 | ||
|
||
appVersion: 2.0.0 | ||
|
||
type: application | ||
|
||
keywords: | ||
- kubeflow trainer | ||
|
||
home: https://github.com/kubeflow/trainer | ||
|
||
maintainers: | ||
- name: ChenYi015 | ||
email: github@chenyicn.net | ||
url: https://github.com/ChenYi015 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
# trainer | ||
|
||
data:image/s3,"s3://crabby-images/8ac65/8ac65e91ebe9e9c863ea9c1720a8fbdd4c1d9cac" alt="Version: 2.0.0" data:image/s3,"s3://crabby-images/3ae12/3ae1256bc7260dfc5aaa63b1880071e4d2711daf" alt="Type: application" data:image/s3,"s3://crabby-images/4fd13/4fd1324c627e839ef0120d8df2a73a672398da10" alt="AppVersion: 2.0.0" | ||
|
||
A Helm chart for deploying Kubeflow trainer on Kubernetes. | ||
|
||
**Homepage:** <https://github.com/kubeflow/trainer> | ||
|
||
## Introduction | ||
|
||
This chart bootstraps a [Kubernetes Trainer](https://github.com/kubeflow/trainer) deployment using the [Helm](https://helm.sh) package manager. | ||
|
||
## Prerequisites | ||
|
||
- Helm >= 3 | ||
- Kubernetes >= 1.20 | ||
|
||
## Usage | ||
|
||
### Add Helm Repo | ||
|
||
```bash | ||
helm repo add trainer https://kubeflow.github.io/trainer | ||
|
||
helm repo update | ||
``` | ||
|
||
See [helm repo](https://helm.sh/docs/helm/helm_repo) for command documentation. | ||
|
||
### Install the chart | ||
|
||
```bash | ||
helm install [RELEASE_NAME] trainer/trainer | ||
``` | ||
|
||
For example, if you want to create a release with name `trainer` in the `kubeflow-system` namespace: | ||
|
||
```shell | ||
helm install trainer trainer/trainer \ | ||
--namespace kubeflow-system \ | ||
--create-namespace | ||
``` | ||
|
||
Note that by passing the `--create-namespace` flag to the `helm install` command, `helm` will create the release namespace if it does not exist. | ||
|
||
See [helm install](https://helm.sh/docs/helm/helm_install) for command documentation. | ||
|
||
### Upgrade the chart | ||
|
||
```shell | ||
helm upgrade [RELEASE_NAME] trainer/trainer [flags] | ||
``` | ||
|
||
See [helm upgrade](https://helm.sh/docs/helm/helm_upgrade) for command documentation. | ||
|
||
### Uninstall the chart | ||
|
||
```shell | ||
helm uninstall [RELEASE_NAME] | ||
``` | ||
|
||
This removes all the Kubernetes resources associated with the chart and deletes the release, except for the `crds`, those will have to be removed manually. | ||
|
||
See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command documentation. | ||
|
||
## Values | ||
|
||
| Key | Type | Default | Description | | ||
|-----|------|---------|-------------| | ||
| nameOverride | string | `""` | String to partially override release name. | | ||
| fullnameOverride | string | `""` | String to fully override release name. | | ||
| commonLabels | object | `{}` | Common labels to add to the resources. | | ||
| image.registry | string | `"docker.io"` | Image registry. | | ||
| image.repository | string | `"kubeflow/trainer-controller-controller"` | Image repository. | | ||
| image.tag | string | If not set, the chart appVersion will be used. | Image tag. | | ||
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy. | | ||
| image.pullSecrets | list | `[]` | Image pull secrets for private image registry. | | ||
| controller.replicas | int | `1` | Number of replicas of controller. | | ||
| controller.labels | object | `{}` | Extra labels for controller pods. | | ||
| controller.annotations | object | `{}` | Extra annotations for controller pods. | | ||
| controller.volumes | list | `[]` | Volumes for controller pods. | | ||
| controller.nodeSelector | object | `{}` | Node selector for controller pods. | | ||
| controller.affinity | object | `{}` | Affinity for controller pods. | | ||
| controller.tolerations | list | `[]` | List of node taints to tolerate for controller pods. | | ||
| controller.env | list | `[]` | Environment variables for controller containers. | | ||
| controller.envFrom | list | `[]` | Environment variable sources for controller containers. | | ||
| controller.volumeMounts | list | `[]` | Volume mounts for controller containers. | | ||
| controller.resources | object | `{}` | Pod resource requests and limits for controller containers. | | ||
| controller.securityContext | object | `{}` | Security context for controller containers. | | ||
| controller.serviceAccount.create | bool | `true` | Specifies whether to create a service account for the controller. | | ||
| controller.serviceAccount.name | string | `""` | Optional name for the controller service account. | | ||
| controller.serviceAccount.annotations | object | `{}` | Extra annotations for the controller service account. | | ||
| controller.serviceAccount.automountServiceAccountToken | bool | `true` | Auto-mount service account token to the controller pods. | | ||
| webhook.enable | bool | `true` | Specifies whether to enable webhook. | | ||
| webhook.failurePolicy | string | `"Fail"` | Specifies how unrecognized errors are handled. Available options are `Ignore` or `Fail`. | | ||
| runtime.preTraining.torchDistributed.enable | bool | `true` | | | ||
| runtime.preTraining.torchDistributed.image.registry | string | `"docker.io"` | | | ||
| runtime.preTraining.torchDistributed.image.repository | string | `"pytorch/pytorch"` | | | ||
| runtime.preTraining.torchDistributed.image.tag | string | `"2.5.0-cuda12.4-cudnn9-runtime"` | | | ||
|
||
## Maintainers | ||
|
||
| Name | Email | Url | | ||
| ---- | ------ | --- | | ||
| ChenYi015 | <github@chenyicn.net> | <https://github.com/ChenYi015> | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
{{ template "chart.header" . }} | ||
|
||
{{ template "chart.deprecationWarning" . }} | ||
|
||
{{ template "chart.badgesSection" . }} | ||
|
||
{{ template "chart.description" . }} | ||
|
||
{{ template "chart.homepageLine" . }} | ||
|
||
## Introduction | ||
|
||
This chart bootstraps a [Kubernetes Trainer]({{template "chart.homepage" . }}) deployment using the [Helm](https://helm.sh) package manager. | ||
|
||
## Prerequisites | ||
|
||
- Helm >= 3 | ||
- Kubernetes >= 1.20 | ||
|
||
## Usage | ||
|
||
### Add Helm Repo | ||
|
||
```bash | ||
helm repo add trainer https://kubeflow.github.io/trainer | ||
|
||
helm repo update | ||
``` | ||
|
||
See [helm repo](https://helm.sh/docs/helm/helm_repo) for command documentation. | ||
|
||
### Install the chart | ||
|
||
```bash | ||
helm install [RELEASE_NAME] trainer/trainer | ||
``` | ||
|
||
For example, if you want to create a release with name `trainer` in the `kubeflow-system` namespace: | ||
|
||
```shell | ||
helm install trainer trainer/trainer \ | ||
--namespace kubeflow-system \ | ||
--create-namespace | ||
``` | ||
|
||
Note that by passing the `--create-namespace` flag to the `helm install` command, `helm` will create the release namespace if it does not exist. | ||
|
||
See [helm install](https://helm.sh/docs/helm/helm_install) for command documentation. | ||
|
||
### Upgrade the chart | ||
|
||
```shell | ||
helm upgrade [RELEASE_NAME] trainer/trainer [flags] | ||
``` | ||
|
||
See [helm upgrade](https://helm.sh/docs/helm/helm_upgrade) for command documentation. | ||
|
||
### Uninstall the chart | ||
|
||
```shell | ||
helm uninstall [RELEASE_NAME] | ||
``` | ||
|
||
This removes all the Kubernetes resources associated with the chart and deletes the release, except for the `crds`, those will have to be removed manually. | ||
|
||
See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command documentation. | ||
|
||
{{ template "chart.valuesSection" . }} | ||
|
||
{{ template "chart.maintainersSection" . }} |
Oops, something went wrong.