Skip to content

Commit

Permalink
Improve multi_images.py- use core image and configure sandbox.config (f…
Browse files Browse the repository at this point in the history
…lyteorg#750)

* Configure sandbox.config

Removed Dockerfile.prediction and sandbox.config present in `containerization` folder.
Specify ``core`` image in container_image
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Added recommended way of specifying docker image

Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Changed default to fqn

Changed default to fqn in format of 'container_image'
Updated sandbox.config
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Updated sandbox.config

Signed-off-by: SmritiSatyanV <smriti@union.ai>
  • Loading branch information
SmritiSatyanV authored Jun 8, 2022
1 parent 8941267 commit 2defe95
Show file tree
Hide file tree
Showing 6 changed files with 37 additions and 91 deletions.
41 changes: 0 additions & 41 deletions cookbook/core/containerization/Dockerfile.prediction

This file was deleted.

76 changes: 31 additions & 45 deletions cookbook/core/containerization/multi_images.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,45 +5,17 @@
----------------------------------------------
When working locally, it is recommended to install all requirements of your project locally (maybe in a single virtual environment). It gets complicated when you want to deploy your code to a remote
environment. This is because most tasks in Flyte (function tasks) are deployed using a Docker Container.
environment since most tasks in Flyte (function tasks) are deployed using a Docker Container.
A Docker container allows you to create an expected environment for your tasks. It is also possible to build a single container image with all your dependencies, but sometimes this is complicated and impractical.
Here are the reasons why it is complicated and not recommended:
#. All the dependencies in one container increase the size of the container image.
#. Some task executions like Spark, SageMaker-based Training, and deep learning use GPUs that need specific runtime configurations. For example,
- Spark needs JavaVirtualMachine installation and Spark entrypoints to be set
- NVIDIA drivers and corresponding libraries need to be installed to use GPUs for deep learning. However, these are not required for a CPU
- SageMaker expects the ENTRYPOINT to be designed to accept its parameters
#. Building a single image may increase the build time for the image itself.
.. note::
Flyte (Service) by default does not require a workflow to be bound to a single container image. Flytekit offers a simple interface to easily alter the images that should be associated with every task, yet keeping the local execution simple for the user.
For every :py:class:`flytekit.PythonFunctionTask` type task or simply a task that is decorated with the ``@task`` decorator, users can supply rules of how the container image should be bound. By default, flytekit binds one container image, i.e., the ``default`` image to all tasks.
For every :py:class:`flytekit.PythonFunctionTask` type task or simply a task decorated with the ``@task`` decorator, users can supply rules of how the container image should be bound. By default, flytekit binds one container image, i.e., the ``default`` image to all tasks.
To alter the image, use the ``container_image`` parameter available in the :py:func:`flytekit.task` decorator. Any one of the following is an acceptable:
#. Image reference is specified, but the version is derived from the default image version ``container_image="docker.io/redis:{{.image.default.version}},``
#. Both the FQN and the version are derived from the default image ``container_image="{{.image.default.fqn}}:spark-{{.image.default.version}},``
The images themselves are parameterizable in the config in the following format:
``{{.image.<name>.<attribute>}}``
- ``name`` refers to the name of the image in the image configuration. The name ``default`` is a reserved keyword and will automatically apply to the default image name for this repository.
- ``fqn`` refers to the fully qualified name of the image. For example, it includes the repository and domain url of the image. Example: ``docker.io/my_repo/xyz``.
- ``version`` refers to the tag of the image. For example: `latest`, or `python-3.8` etc. If the `container_image` is not specified then the default configured image for the project is used.
.. note::
The default image (name + version) is always ``{{.image.default.fqn}}:{{.image.default.version}}``
#. Image reference is specified, but the version is derived from the default image version ``container_image="docker.io/redis:{{.image.default.version}}``
#. Both the FQN and the version are derived from the default image ``container_image="{{.image.default.fqn}}:spark-{{.image.default.version}}``
.. warning:
To be able to use the image, push a container image that matches the new name described.
To use the image, push a container image that matches the new name described.
If you wish to build and push your Docker image to GHCR, follow `this <https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry>`_.
If you wish to build and push your Docker image to Dockerhub through your account, follow the below steps:
Expand All @@ -59,7 +31,7 @@
.. code-block::
docker login
4. It will prompt you to enter the username and the password.
4. It prompts you to enter the username and the password.
5. Push the Docker image to Dockerhub:
.. code-block::
Expand All @@ -76,9 +48,9 @@
.. tip::
Sometimes, ``docker login`` may not be successful. In such a case, execute ``docker logout`` and ``docker login``.
Sometimes, ``docker login`` may not be successful. In such cases, execute ``docker logout`` and ``docker login``.
Let us understand how multiple images can be used within a single workflow.
Let's dive into the example.
"""
# %%
# Import the necessary dependencies.
Expand All @@ -102,7 +74,7 @@
# %%
# Define a task that fetches data and splits the data into train and test sets.
@task(
container_image="ghcr.io/flyteorg/flytecookbook:core-with-sklearn-baa17ccf39aa667c5950bd713a4366ce7d5fccaf7f85e6be8c07fe4b522f92c3"
container_image="{{.image.trainer.fqn }}:{{.image.trainer.version}}"
)
def svm_trainer() -> split_data:
fish_data = pd.read_csv(dataset_url)
Expand All @@ -122,15 +94,25 @@ def svm_trainer() -> split_data:

# %%
# .. note ::
#
# To use your own Docker image, replace the value of `container_image` with the fully qualified name that identifies where the image has been pushed.
# One pattern has been specified in the task itself, i.e., specifying the Docker image URI. The recommended usage is:
# The recommended usage (specified in the example) is:
#
# ``container_image= "{{.image.default.fqn}}:{{.image.default.version}}"``
#
# #. ``image`` refers to the name of the image in the image configuration. The name ``default`` is a reserved keyword and will automatically apply to the default image name for this repository.
# #. ``fqn`` refers to the fully qualified name of the image. For example, it includes the repository and domain url of the image. Example: ``docker.io/my_repo/xyz``.
# #. ``version`` refers to the tag of the image. For example: `latest`, or `python-3.8` etc. If the `container_image` is not specified then the default configured image for the project is used.
#
# The images themselves are parameterizable in the config file in the following format:
#
# ``container_image="{{.image.default.fqn}}:multi-images-preprocess-{{.image.default.version}}"``
# ``{{.image.<name>.<attribute>}}``


# %%
# Define another task that trains the model on the data and computes the accuracy score.
@task(
container_image="ghcr.io/flyteorg/flytecookbook:multi-image-predict-98b125fd57d20594026941c2ebe7ef662e5acb7d6423660a65f493ca2d9aa267"
container_image="{{.image.predictor.fqn }}:{{.image.predictor.version}}"
)
def svm_predictor(
X_train: pd.DataFrame,
Expand All @@ -144,7 +126,6 @@ def svm_predictor(
accuracy_score = float(model.score(X_test, y_test.values.ravel()))
return accuracy_score


# %%
# Define a workflow.
@workflow
Expand All @@ -158,10 +139,15 @@ def my_workflow() -> float:
)
return svm_accuracy


if __name__ == "__main__":
print(f"Running my_workflow(), accuracy : { my_workflow() }")
print(f"Running my_workflow(), accuracy: {my_workflow()}")

# %%
# .. note::
# Notice that the two task annotators have two different `container_image`s specified.
# Configuring sandbox.config
# ==========================
#
# The container image referenced in the tasks above is specified in the sandbox.config file. Provided a name to every Docker image, and reference that in ``container_image``. In this example, we have used the ``core`` image for both the tasks for illustration purposes.
#
# sandbox.config
# ^^^^^^^^^^^^^^
# .. literalinclude:: ../../../../core/sandbox.config
3 changes: 0 additions & 3 deletions cookbook/core/containerization/sandbox.config

This file was deleted.

2 changes: 1 addition & 1 deletion cookbook/core/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -202,4 +202,4 @@ wrapt==1.14.0
# deprecated
# flytekit
zipp==3.8.0
# via importlib-metadata
# via importlib-metadata
4 changes: 4 additions & 0 deletions cookbook/core/sandbox.config
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
[sdk]
workflow_packages=core

[images]
trainer = ghcr.io/flyteorg/flytecookbook:core-latest
predictor = ghcr.io/flyteorg/flytecookbook:core-latest
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@
# - the predicted attribute
#
# In practice, we'd want to do a little data exploration to first to get a sense of the distribution of variables.
# A useful resource for this is the `Kaggle <https://www.kaggle.com/datasets/cherngs/heart-disease-cleveland-uci>`__ version of this dataset,
# A useful resource for this is the `Kaggle <https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset>`__ version of this dataset,
# which has been slightly preprocessed to be model-ready.
#
# .. Note::
Expand Down

0 comments on commit 2defe95

Please sign in to comment.