Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate AMD GPU in CI/CD environment #26007

Merged
merged 31 commits into from
Sep 20, 2023
Merged
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
48d3efb
Add a Dockerfile for PyTorch + ROCm based on official AMD released ar…
mfuntowicz Sep 6, 2023
c1acac0
Add a new artifact single-amdgpu testing on main
mfuntowicz Sep 6, 2023
70dbee0
Attempt to test the workflow without merging.
mfuntowicz Sep 6, 2023
96639bb
Changed BERT to check if things are triggered
mfuntowicz Sep 6, 2023
8f3e698
Meet the dependencies graph on workflow
mfuntowicz Sep 6, 2023
4cd3871
Revert BERT changes
mfuntowicz Sep 6, 2023
cc62d3d
Add check_runners_amdgpu to correctly mount and check availability
mfuntowicz Sep 7, 2023
a4a639c
Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD
mfuntowicz Sep 7, 2023
2485383
Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies
mfuntowicz Sep 7, 2023
f99374d
Fix setup dependency graph to use check_runner_amdgpu
mfuntowicz Sep 7, 2023
c045c1e
Let's do the runner status check only on AMDGPU target
mfuntowicz Sep 7, 2023
d3c3e72
Update the Dockerfile.amd to put ourselves in / rather than /var/lib
mfuntowicz Sep 7, 2023
d237227
Restore the whole setup for CUDA too.
mfuntowicz Sep 7, 2023
1a0e302
Let's redisable them
mfuntowicz Sep 7, 2023
ad48aeb
Change BERT to trigger tests
mfuntowicz Sep 7, 2023
0107f55
Restore BERT
mfuntowicz Sep 7, 2023
4a2efa4
Add torchaudio with rocm 5.6 to AMD Dockerfile (#26050)
fxmarty Sep 8, 2023
cd106b4
Place AMD GPU tests in a separate workflow (correct branch) (#26105)
fxmarty Sep 12, 2023
933b00f
Fix invalid job name is dependencies.
mfuntowicz Sep 14, 2023
7c1edd9
Remove tests multi-amdgpu for now.
mfuntowicz Sep 14, 2023
4c35979
Use single-amdgpu
mfuntowicz Sep 14, 2023
8dcc3b4
Use --net=host for now.
mfuntowicz Sep 14, 2023
17e07f5
Remote host networking.
mfuntowicz Sep 15, 2023
d76455d
Removed duplicated check_runners_amdgpu step
mfuntowicz Sep 15, 2023
f58b7ae
Let's tag machine-types with mi210 for now.
mfuntowicz Sep 15, 2023
422110e
Machine type should be only mi210
mfuntowicz Sep 15, 2023
6b860be
Remove unnecessary push.branches item
mfuntowicz Sep 15, 2023
a8690cd
Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducin…
mfuntowicz Sep 18, 2023
ec4787f
Remove amdgpu from step names.
mfuntowicz Sep 18, 2023
047dd96
finalize
ydshieh Sep 20, 2023
d1fb120
delete
ydshieh Sep 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix invalid job name is dependencies.
  • Loading branch information
mfuntowicz committed Sep 14, 2023
commit 933b00f18901082ba3826e5eccf28699e242dbe1
8 changes: 4 additions & 4 deletions .github/workflows/self-push-amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -269,10 +269,10 @@ jobs:
check_runner_status,
check_runners,
setup_amdgpu,
run_tests_single_gpu,
run_tests_multi_gpu,
run_tests_torch_cuda_extensions_single_gpu,
run_tests_torch_cuda_extensions_multi_gpu
run_tests_single_amdgpu,
run_tests_multi_amdgpu,
# run_tests_torch_cuda_extensions_single_gpu,
# run_tests_torch_cuda_extensions_multi_gpu
]
steps:
- name: Preliminary job status
Expand Down