Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AMD CI workflow defined in hf-workflows #35058

Merged
merged 15 commits into from
Jan 17, 2025
Merged

Use AMD CI workflow defined in hf-workflows #35058

merged 15 commits into from
Jan 17, 2025

Conversation

ivarflakstad
Copy link
Member

@ivarflakstad ivarflakstad commented Dec 3, 2024

What does this PR do?

Uses external workflows defined in hf-workflows for self hosted AMD runners. This removes a possible attack vector on self hosted CI runners on github.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 5, 2024

LGTM. But is it still in draft mode?

Also, does everything run smooth with this new hf-workflows now? Do you need a merge to test?

@ivarflakstad ivarflakstad marked this pull request as ready for review December 5, 2024 17:08
@ivarflakstad
Copy link
Member Author

LGTM. But is it still in draft mode?

Also, does everything run smooth with this new hf-workflows now? Do you need a merge to test?

Yes, I kept it in draft until the PR in hf-workflows was merged so that I could test 👍

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks.

I would suggest however using the push event to test if everything is working fine before merge however. By fine I means the call is working and we know the resources could be accessed etc, not meaning the jobs / reports themselves.

(There are line ending issues which was CRLF, it was my bad, and it's good to change to LF here)

@@ -920,6 +922,8 @@ def prepare_reports(title, header, reports, to_truncate=True):

if __name__ == "__main__":
SLACK_REPORT_CHANNEL_ID = os.environ["SLACK_REPORT_CHANNEL"]
REPORT_REPO_ID = os.environ.get("REPORT_REPO_ID", "hf-internal-testing/transformers_daily_ci")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, IIRC, REPORT_REPO_ID will exist as you define it in slack-report.yml, despite it could be empty string if not from AMD (i.e. for Nvidia CI). And we need to change the condition here.

(just like what you did one line below for UPLOAD_REPORT_SUMMARY)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, have to check for empty string as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -533,11 +535,11 @@ def payload(self) -> str:
commit_info = api.upload_file(
path_or_fileobj=file_path,
path_in_repo=f"{datetime.datetime.today().strftime('%Y-%m-%d')}/ci_results_{job_name}/new_model_failures.txt",
repo_id="hf-internal-testing/transformers_daily_ci",
repo_id=self.repo_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new_model_failures stuff depends on prev_ci_artifacts which is defined by get_last_daily_ci_reports (called in this file toward the end of this file).

It's hard coded to fetch the previous run of Nvidia scheduled CI runs.

If AMD also want to have these files uploaded, we have to adjust the definition of get_last_daily_ci_reports

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I'll add support for differentiating them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivarflakstad ivarflakstad requested a review from ydshieh January 15, 2025 11:08
@@ -1220,7 +1227,8 @@ def prepare_reports(title, header, reports, to_truncate=True):
os.makedirs(os.path.join(os.getcwd(), f"ci_results_{job_name}"))

target_workflow = "huggingface/transformers/.github/workflows/self-scheduled-caller.yml@refs/heads/main"
is_scheduled_ci_run = os.environ.get("CI_WORKFLOW_REF") == target_workflow
amd_target_workflow = "huggingface/transformers/.github/workflows/self-scheduled-amd-caller.yml@refs/heads/main"
is_scheduled_ci_run = os.environ.get("CI_WORKFLOW_REF") in [target_workflow, amd_target_workflow]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit uncertain here

CI_WORKFLOW_REF: ${{ github.workflow_ref }}

when it comes to AMD CI (called from hf-workflows ), what github.workflow_ref would be ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my bad.

The triggered workflow run would be .github/workflows/self-scheduled-amd-caller.yml, and hf-workflows ones are being called

@ivarflakstad ivarflakstad merged commit 5fa3534 into main Jan 17, 2025
11 checks passed
@ivarflakstad ivarflakstad deleted the secure-amd-ci branch January 17, 2025 19:52
@ivarflakstad ivarflakstad restored the secure-amd-ci branch January 17, 2025 19:56
bursteratom pushed a commit to bursteratom/transformers that referenced this pull request Jan 31, 2025
* Use AMD CI workflow defined in hf-workflows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants