Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mlflow-builder: fix OOM failures during build with bigger images
If the k8s node where the MLFlow builder step is running doesn't have a lot of memory, the builder step will fail if it has to build larger images. For example, building the trainer image for the keras CIFAR10 codeset example resulted in an OOM failure on a node where only 8GB of memory were available. This is a known kaniko issue [1] and there's a fix available [2] with more recent (>=1.7.0) kaniko versions: disabling the compressed caching via the `--compressed-caching` command line argument. This commit models a workflow input parameter mapped to this new command line argument. To avoid OOM errors with bigger images, the user may set it in the workflow like so: ``` - name: builder image: ghcr.io/stefannica/mlflow-builder:latest inputs: - name: mlflow-codeset codeset: name: '{{ inputs.mlflow-codeset }}' path: /project - name: compressed_caching # Disable compressed caching to avoid running into OOM errors on cluster nodes with lower memory value: false ``` [1] GoogleContainerTools/kaniko#909 [2] GoogleContainerTools/kaniko#1722
- Loading branch information