Skip to content

Commit

Permalink
Using pytorch cpu only version and fixing broken build for aws exampl…
Browse files Browse the repository at this point in the history
…es (flyteorg#179)

* Used pytorch cpu image

Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>

* Added documentation and using the .in instead of generated file

Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>

* Updated kfpytorch aswell

Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>

* Removed the comments in requirements.in file

Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>
  • Loading branch information
pmahindrakar-oss authored May 6, 2021
1 parent 35fe9db commit 557dab4
Show file tree
Hide file tree
Showing 8 changed files with 126 additions and 128 deletions.
8 changes: 7 additions & 1 deletion cookbook/integrations/aws/sagemaker_pytorch/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# We use devel because plugins_sagemaker-training needs gcc to build
# TODO get rid of plugins_sagemaker-training
FROM pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel
FROM python:3.8-slim-buster
LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks

WORKDIR /root
Expand All @@ -11,6 +11,12 @@ ENV PYTHONPATH /root
# Install the AWS cli separately to prevent issues with boto being written over
RUN pip install awscli

# Install gcc , g++ and make
RUN echo 'deb http://deb.debian.org/debian testing main' >> /etc/apt/sources.list \
&& apt-get update && apt-get install --no-install-recommends -y gcc g++
RUN echo 'installing make' \
&& apt-get install make

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
Expand Down
40 changes: 25 additions & 15 deletions cookbook/integrations/aws/sagemaker_pytorch/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,45 +15,55 @@ To use the flytekit aws sagemaker plugin simply run the following:

Creating a dockerfile for Sagemaker custom training [Required]
--------------------------------------------------------------
The dockerfile for Sagemaker custom training is similar to any regular dockerfile, except for the difference in using the Nvidia cuda base.

The dockerfile for Sagemaker custom training is similar to any regular dockerfile, except for the difference in using the Nvidia cuda base to use GPU's

.. note::

If using CPU for training then special dockerfile is NOT REQUIRED. If GPU or TPUs are required then, the dockerfile differs only in the driver setup. The following dockerfile is enabled for GPU accelerated training using CUDA
The checked in version of docker file uses python:3.8-slim-buster for faster CI but you can use the Dockerfile pasted below which uses cuda base.
Additionally the requirements.in uses the cpu version of pytorch. Remove the + cpu for torch and torchvision in requirements.in and make all requirements as shown below

.. prompt:: bash

make -C integrations/aws/sagemaker_pytorch requirements


.. code-block:: docker
:emphasize-lines: 23-24
:linenos:
# We use devel because plugins_sagemaker-training needs gcc to build
# TODO get rid of plugins_sagemaker-training
FROM pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel
LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks
WORKDIR /root
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
# Install the AWS cli separately to prevent issues with boto being written over
RUN pip install awscli
ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
# Install Python dependencies
COPY aws/sagemaker_pytorch/requirements.txt /root/.
COPY sagemaker_pytorch/requirements.txt /root/.
RUN pip install -r /root/requirements.txt
# Setup Sagemaker entrypoints
ENV SAGEMAKER_PROGRAM /opt/venv/bin/flytekit_sagemaker_runner.py
# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY aws/sagemaker_pytorch/sandbox.config /root
COPY sagemaker_pytorch/sandbox.config /root
# Copy the actual code
COPY aws/sagemaker_pytorch/ /root/sagemaker_pytorch
COPY sagemaker_pytorch/ /root/sagemaker_pytorch
# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
ENV FLYTE_INTERNAL_IMAGE $tag
6 changes: 3 additions & 3 deletions cookbook/integrations/aws/sagemaker_pytorch/requirements.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-r ../../../../common/requirements-common.in
-r ../../../common/requirements-common.in
flytekitplugins-awssagemaker>=0.16.0
torch
torchvision
--find-links https://download.pytorch.org/whl/torch_stable.html torch==1.8.1+cpu
--find-links https://download.pytorch.org/whl/torch_stable.html torchvision==0.9.1+cpu
tensorboardX
85 changes: 35 additions & 50 deletions cookbook/integrations/aws/sagemaker_pytorch/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@
#
# /Library/Developer/CommandLineTools/usr/bin/make requirements.txt
#
--find-links https://download.pytorch.org/whl/torch_stable.html

attrs==20.3.0
# via scantree
bcrypt==3.2.0
# via paramiko
boto3==1.17.35
boto3==1.17.67
# via sagemaker-training
botocore==1.20.35
botocore==1.20.67
# via
# boto3
# s3transfer
Expand All @@ -25,87 +27,79 @@ chardet==4.0.0
# via requests
click==7.1.2
# via flytekit
croniter==1.0.8
croniter==1.0.12
# via flytekit
cryptography==3.4.6
cryptography==3.4.7
# via paramiko
cycler==0.10.0
# via matplotlib
dataclasses-json==0.5.2
dataclasses-json==0.5.3
# via flytekit
decorator==4.4.2
decorator==5.0.7
# via retry
deprecated==1.2.12
# via flytekit
dirhash==0.2.1
# via flytekit
docker-image-py==0.1.10
# via flytekit
flyteidl==0.18.25
flyteidl==0.18.41
# via flytekit
flytekit==0.17.0b0
flytekit==0.18.0
# via
# -r ../../common/requirements-common.in
# -r ../../../common/requirements-common.in
# flytekitplugins-awssagemaker
flytekitplugins-awssagemaker==0.16.0
flytekitplugins-awssagemaker==0.18.0
# via -r requirements.in
future==0.18.2
# via croniter
gevent==21.1.2
# via sagemaker-training
greenlet==1.0.0
# via gevent
grpcio==1.36.1
grpcio==1.37.1
# via flytekit
idna==2.10
# via requests
importlib-metadata==3.7.3
importlib-metadata==4.0.1
# via keyring
inotify_simple==1.2.1
# via sagemaker-training
jmespath==0.10.0
# via
# boto3
# botocore
keyring==23.0.0
keyring==23.0.1
# via flytekit
kiwisolver==1.3.1
# via matplotlib
marshmallow-enum==1.5.1
# via dataclasses-json
marshmallow==3.10.0
marshmallow==3.11.1
# via
# dataclasses-json
# marshmallow-enum
matplotlib==3.3.4
# via -r ../../common/requirements-common.in
matplotlib==3.4.1
# via -r ../../../common/requirements-common.in
mypy-extensions==0.4.3
# via typing-inspect
natsort==7.1.1
# via
# croniter
# flytekit
numpy==1.20.1
# via flytekit
numpy==1.20.2
# via
# matplotlib
# pandas
# pyarrow
# sagemaker-training
# scipy
# tensorboardx
# torch
# torchvision
pandas==1.2.3
pandas==1.2.4
# via flytekit
paramiko==2.7.2
# via sagemaker-training
pathspec==0.8.1
# via scantree
pillow==8.1.2
# via
# matplotlib
# torchvision
protobuf==3.15.6
pillow==8.2.0
# via matplotlib
protobuf==3.15.8
# via
# flyteidl
# flytekit
Expand Down Expand Up @@ -136,27 +130,27 @@ pytz==2018.4
# via
# flytekit
# pandas
regex==2021.3.17
regex==2021.4.4
# via docker-image-py
requests==2.25.1
# via
# flytekit
# responses
responses==0.13.1
responses==0.13.3
# via flytekit
retry==0.9.2
# via flytekit
retrying==1.3.3
# via sagemaker-training
s3transfer==0.3.6
s3transfer==0.4.2
# via boto3
sagemaker-training==3.7.3
sagemaker-training==3.9.2
# via flytekitplugins-awssagemaker
scantree==0.0.1
# via dirhash
scipy==1.6.1
scipy==1.6.3
# via sagemaker-training
six==1.15.0
six==1.16.0
# via
# bcrypt
# cycler
Expand All @@ -169,25 +163,16 @@ six==1.15.0
# retrying
# sagemaker-training
# scantree
# tensorboardx
sortedcontainers==2.3.0
# via flytekit
statsd==3.3.0
# via flytekit
stringcase==1.2.0
# via dataclasses-json
tensorboardx==2.1
# via -r requirements.in
torch==1.8.0
# via
# -r requirements.in
# torchvision
torchvision==0.9.0
tensorboardx==2.2
# via -r requirements.in
typing-extensions==3.7.4.3
# via
# torch
# typing-inspect
typing-extensions==3.10.0.0
# via typing-inspect
typing-inspect==0.6.0
# via dataclasses-json
urllib3==1.25.11
Expand All @@ -200,7 +185,7 @@ werkzeug==1.0.1
# via sagemaker-training
wheel==0.36.2
# via
# -r ../../common/requirements-common.in
# -r ../../../common/requirements-common.in
# flytekit
wrapt==1.12.1
# via
Expand All @@ -210,7 +195,7 @@ zipp==3.4.1
# via importlib-metadata
zope.event==4.5.0
# via gevent
zope.interface==5.3.0
zope.interface==5.4.0
# via gevent

# The following packages are considered to be unsafe in a requirements file:
Expand Down
8 changes: 7 additions & 1 deletion cookbook/integrations/kubernetes/kfpytorch/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM pytorch/pytorch:1.7.0-cuda11.0-cudnn8-runtime
FROM python:3.8-slim-buster
LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks

WORKDIR /root
Expand All @@ -12,6 +12,12 @@ RUN apt-get update && apt-get install -y make build-essential libssl-dev curl
# Install the AWS cli separately to prevent issues with boto being written over
RUN pip install awscli

# Install gcc , g++ and make
RUN echo 'deb http://deb.debian.org/debian testing main' >> /etc/apt/sources.list \
&& apt-get update && apt-get install --no-install-recommends -y gcc g++
RUN echo 'installing make' \
&& apt-get install make

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
Expand Down
30 changes: 18 additions & 12 deletions cookbook/integrations/kubernetes/kfpytorch/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,41 +18,47 @@ How to build your Dockerfile for Pytorch on K8s
.. note::

If using CPU for training then special dockerfile is NOT REQUIRED. If GPU or TPUs are required then, the dockerfile differs only in the driver setup. The following dockerfile is enabled for GPU accelerated training using CUDA
The checked in version of docker file uses python:3.8-slim-buster for faster CI but you can use the Dockerfile pasted below which uses cuda base.
Additionally the requirements.in uses the cpu version of pytorch. Remove the + cpu for torch and torchvision in requirements.in and make all requirements as shown below

.. prompt:: bash

make -C integrations/kubernetes/kfpytorch requirements

.. code-block:: docker
:emphasize-lines: 1
:linenos:
FROM pytorch/pytorch:1.7.0-cuda11.0-cudnn8-runtime
FROM pytorch/pytorch:1.7.0-cuda11.0-cudnn8-runtime=
LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks
WORKDIR /root
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
# Install basics
RUN apt-get update && apt-get install -y make build-essential libssl-dev curl
# Install the AWS cli separately to prevent issues with boto being written over
RUN pip install awscli
ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
# Install Python dependencies
COPY kubernetes/kfpytorch/requirements.txt /root
COPY kfpytorch/requirements.txt /root
RUN pip install -r /root/requirements.txt
# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY kubernetes/kfpytorch/sandbox.config /root
COPY kfpytorch/sandbox.config /root
# Copy the actual code
COPY kubernetes/kfpytorch/ /root/kfpytorch/
COPY kfpytorch/ /root/kfpytorch/
# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
Expand Down
6 changes: 3 additions & 3 deletions cookbook/integrations/kubernetes/kfpytorch/requirements.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-r ../../../../common/requirements-common.in
-r ../../../common/requirements-common.in
flytekitplugins-kfpytorch>=0.16.0
tensorboardX
torch
torchvision
--find-links https://download.pytorch.org/whl/torch_stable.html torch==1.8.1+cpu
--find-links https://download.pytorch.org/whl/torch_stable.html torchvision==0.9.1+cpu
Loading

0 comments on commit 557dab4

Please sign in to comment.