Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature][tensorflow] Add support for DIY and SageMaker containers #1278

Merged
merged 36 commits into from
Sep 17, 2021

Conversation

arjkesh
Copy link
Contributor

@arjkesh arjkesh commented Aug 13, 2021

GitHub Issue #, if available:

Note: If merging this PR should also close the associated Issue, please also add that Issue # to the Linked Issues section on the right.

Description

  • POC DIY Deep Learning Containers for TF 2.5
  • Create new markers to distinguish tests
  • Update build code to allow for the DIY --> SM dependency

Tests run

  • DIY images are tested on ec2/ecs/eks, on any test that isn't marked sagemaker_only
  • SM images are tested on sagemaker remote and local, and any test that is marked sagemaker or sagemaker_only
  • Existing images (that are not split into DIY/SageMaker) will run business as usual

DLC image/dockerfile

  • TF 2.5 Training/Inference images

Additional context

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@arjkesh arjkesh marked this pull request as ready for review August 13, 2021 18:51
@arjkesh arjkesh requested review from a team as code owners August 13, 2021 18:51
@arjkesh arjkesh marked this pull request as draft August 13, 2021 18:51
@arjkesh arjkesh changed the title POC DIY/SM builds [poc][tensorflow] Split TF 2.5 into DIY and SageMaker containers Aug 13, 2021
@arjkesh arjkesh added the improvement PRs for improvements or minor additions to pre-existing code label Aug 18, 2021
@arjkesh arjkesh marked this pull request as ready for review August 18, 2021 23:20
@arjkesh arjkesh added diy_container PR is related to a DIY container sagemaker_container PR is related to a sagemaker container labels Aug 20, 2021

RUN HOME_DIR=/root \
&& curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \
&& unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should delete the existing oss licenses coming from base image. Either delete or replace them entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more context on how this would be done?

Copy link
Contributor

@tejaschumbalkar tejaschumbalkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add sagemaker and sagemaker_only markers on Pytest Marker Checklist.

src/image.py Outdated
return bool(self.info.get('base_image_uri'))

@property
def is_test_enabled(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we add a docstring for this function as well?

@@ -424,7 +435,7 @@ def test_cuda_paths(gpu):
python_version = re.search(r"(py\d+)", image).group(1)
short_python_version = None
image_tag = re.search(
r":(\d+(\.\d+){2}(-transformers\d+(\.\d+){2})?-(cpu|gpu|neuron)-(py\d+)(-cu\d+)-(ubuntu\d+\.\d+)(-example)?)",
r":(\d+(\.\d+){2}(-transformers\d+(\.\d+){2})?-(cpu|gpu|neuron)-(py\d+)(-cu\d+)-(ubuntu\d+\.\d+)(-example|-diy|-sagemaker)?)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: regex string can be reformatted

@@ -35,6 +31,7 @@ def test_awscli(mxnet_inference):
test_utils.run_cmd_on_container(container_name, ctx, "aws --version")


@pytest.mark.usefixtures("sagemaker")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should only be ran on SM images. The packages are SM utility packages added soley for SM.
Let revamp the test to reflect the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - if you look below, I made sure that non sm* utility packages are installed in DIY - but if this only needs to be a requirement for sagemaker, then we can remove DIY and make this sm only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the below packages are SM utility packages and should exist only on SM DLC image.

"bokeh", "imageio", "plotly", "seaborn", "shap", "pandas", "cv2", "sagemaker"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since these are open source packages not explicitly marked as sagemaker, we can also have these in the DIY image (minus the "sagemaker") dependency. Do you see any reason for moving them out of the DIY?

@arjkesh arjkesh changed the title [feature][tensorflow] Split TF 2.5 into DIY and SageMaker containers [feature][tensorflow] Add support for DIY and SageMaker containers Sep 16, 2021
@arjkesh arjkesh merged commit ad2489e into aws:master Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diy_container PR is related to a DIY container feature New feature added sagemaker_container PR is related to a sagemaker container Size:XL Determines the size of the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants