Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse the cache created for latest main on PRs/branches if setup.py is not modified #25445

Merged
merged 3 commits into from
Aug 11, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Aug 10, 2023

What does this PR do?

For a PR or a branch, if setup.py is not modified (compare to the common ancestor with the main), let's use cache that is created for the setup.py from the latest commit on the main branch.

latest means the latest one on main at the moment where a run is triggered.

Motivation:

With this PR, we expect the storage usage for cache could be reduced dramatically 🤞 .

The artifact in this job run shows generated_config.txt has explicit checksum used v0.7-pipelines_torch-main-pip-9RXs1YQ8L2beP4cdAfRDkWX0VRTtWaQodDVKzvyJwPI= rather than {{ checksum "setup.py" }}

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 10, 2023

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh force-pushed the rework_cache_again branch from 631f5b2 to 98772c4 Compare August 10, 2023 18:05
@@ -884,6 +884,8 @@ class BertModel(BertPreTrainedModel):
`add_cross_attention` set to `True`; an `encoder_hidden_states` is then expected as an input to the forward pass.
"""

a = 3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this before merge

@ydshieh ydshieh changed the title [WIP] Rework cache again Rework cache again Aug 10, 2023
@ydshieh ydshieh marked this pull request as ready for review August 10, 2023 21:50
@ydshieh ydshieh changed the title Rework cache again Reuse the cache created for main on PRs/branches if setup.py not modified Aug 10, 2023
@ydshieh ydshieh changed the title Reuse the cache created for main on PRs/branches if setup.py not modified Reuse the cache created for latest main on PRs/branches if setup.py not modified Aug 10, 2023
@ydshieh ydshieh changed the title Reuse the cache created for latest main on PRs/branches if setup.py not modified Reuse the cache created for latest main on PRs/branches if setup.py is not modified Aug 10, 2023
@ydshieh ydshieh requested a review from sgugger August 10, 2023 22:16
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Let's try it for a bit and see if it impacts any workflow negatively.

@ydshieh ydshieh force-pushed the rework_cache_again branch from 47334f8 to c56762e Compare August 11, 2023 08:10
- store_artifacts:
path: ~/transformers/tests_fetched_summary.txt
path: test_preparation/tests_fetched_summary.txt
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need this in create_circleci_config.py to check if tests is in the list

Comment on lines +581 to +585
summary_file = os.path.join(folder, "tests_fetched_summary.txt")
if os.path.exists(summary_file):
with open(summary_file) as f:
tests_fetched_summary = f.read()
setup_file_modifiled = "### TEST TO RUN ###\n- tests\n" in tests_fetched_summary
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check the summary file instead of filtered_test_list as it is already changed to a list in filter_tests method

    if test_files == ["tests"]:
        test_files = [os.path.join("tests", f) for f in os.listdir("tests") if f not in ["__init__.py"] + filters]

@ydshieh
Copy link
Collaborator Author

ydshieh commented Aug 11, 2023

@sgugger FYI: I need to change 2 more places to make it work correctly for all cases. See my last 2 review comments.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

@ydshieh ydshieh merged commit 1d75768 into main Aug 11, 2023
@ydshieh ydshieh deleted the rework_cache_again branch August 11, 2023 12:40
@ydshieh
Copy link
Collaborator Author

ydshieh commented Aug 11, 2023

Hi @sgugger

I am afraid I have to revert this PR until we do something to enabling sharing cache (see below at the end). From this section

PRs from the same fork repo share a cache (this includes, as previously stated, that PRs in the main repo share a cache with main).
Two PRs in different fork repos have different caches.

The cache is never shared between PRs/branches from different forks. This might explains the question I posted about why the same cache key could be found sometimes but not other times.

The PR description of #24886 is partially valid as lhoestq created a branch in transformers rather than a PR from his own forked repo.

The way to share cache is

Enabling the sharing of environment variables allows cache sharing between the original repo and all forked builds.

But we need to be careful if we have sensitive env. variables or not.

ydshieh added a commit that referenced this pull request Aug 11, 2023
…`setup.py` is not modified (#25445)"

This reverts commit 1d75768.
ydshieh added a commit that referenced this pull request Aug 11, 2023
…25466)

Revert "Reuse the cache created for latest `main` on PRs/branches if `setup.py` is not modified (#25445)"

This reverts commit 1d75768.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants