Place AMD GPU tests in a separate workflow (correct branch) #26105

fxmarty · 2023-09-12T08:09:27Z

As suggested #26007 (comment) and discussed offline cc @mfuntowicz

Note: this PR merges into a branch, not main.

HuggingFaceDocBuilderDev · 2023-09-12T08:37:05Z

The documentation is not available anymore as the PR was closed or merged.

mfuntowicz · 2023-09-12T08:55:52Z

LGTM waiting from transformers folks feedback ✌🏻

fxmarty · 2023-09-12T12:07:59Z

cc @ydshieh

amyeroberts

LGTM - thanks for adding!

@ydshieh is off for a week. I think it's OK to merge and we can do any necessary updates once he's back and reviewed this PR.

* Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact * Add a new artifact single-amdgpu testing on main * Attempt to test the workflow without merging. * Changed BERT to check if things are triggered * Meet the dependencies graph on workflow * Revert BERT changes * Add check_runners_amdgpu to correctly mount and check availability * Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD * Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies * Fix setup dependency graph to use check_runner_amdgpu * Let's do the runner status check only on AMDGPU target * Update the Dockerfile.amd to put ourselves in / rather than /var/lib * Restore the whole setup for CUDA too. * Let's redisable them * Change BERT to trigger tests * Restore BERT * Add torchaudio with rocm 5.6 to AMD Dockerfile (#26050) fix dockerfile Co-authored-by: Felix Marty <felix@hf.co> * Place AMD GPU tests in a separate workflow (correct branch) (#26105) AMDGPU CI lives in an other workflow * Fix invalid job name is dependencies. * Remove tests multi-amdgpu for now. * Use single-amdgpu * Use --net=host for now. * Remote host networking. * Removed duplicated check_runners_amdgpu step * Let's tag machine-types with mi210 for now. * Machine type should be only mi210 * Remove unnecessary push.branches item * Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels. * Remove amdgpu from step names. * finalize * delete --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact * Add a new artifact single-amdgpu testing on main * Attempt to test the workflow without merging. * Changed BERT to check if things are triggered * Meet the dependencies graph on workflow * Revert BERT changes * Add check_runners_amdgpu to correctly mount and check availability * Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD * Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies * Fix setup dependency graph to use check_runner_amdgpu * Let's do the runner status check only on AMDGPU target * Update the Dockerfile.amd to put ourselves in / rather than /var/lib * Restore the whole setup for CUDA too. * Let's redisable them * Change BERT to trigger tests * Restore BERT * Add torchaudio with rocm 5.6 to AMD Dockerfile (huggingface#26050) fix dockerfile Co-authored-by: Felix Marty <felix@hf.co> * Place AMD GPU tests in a separate workflow (correct branch) (huggingface#26105) AMDGPU CI lives in an other workflow * Fix invalid job name is dependencies. * Remove tests multi-amdgpu for now. * Use single-amdgpu * Use --net=host for now. * Remote host networking. * Removed duplicated check_runners_amdgpu step * Let's tag machine-types with mi210 for now. * Machine type should be only mi210 * Remove unnecessary push.branches item * Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels. * Remove amdgpu from step names. * finalize * delete --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

AMDGPU CI lives in an other workflow

e87d659

amyeroberts approved these changes Sep 12, 2023

View reviewed changes

mfuntowicz merged commit cd106b4 into huggingface:ci-amdgpu Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Place AMD GPU tests in a separate workflow (correct branch) #26105

Place AMD GPU tests in a separate workflow (correct branch) #26105

fxmarty commented Sep 12, 2023

HuggingFaceDocBuilderDev commented Sep 12, 2023 •

edited

Loading

mfuntowicz commented Sep 12, 2023

fxmarty commented Sep 12, 2023

amyeroberts left a comment

Place AMD GPU tests in a separate workflow (correct branch) #26105

Place AMD GPU tests in a separate workflow (correct branch) #26105

Conversation

fxmarty commented Sep 12, 2023

HuggingFaceDocBuilderDev commented Sep 12, 2023 • edited Loading

mfuntowicz commented Sep 12, 2023

fxmarty commented Sep 12, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 12, 2023 •

edited

Loading