Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support aborting on cache miss #828

Merged
merged 9 commits into from
Feb 12, 2025
Merged

support aborting on cache miss #828

merged 9 commits into from
Feb 12, 2025

Conversation

matthiasdiener
Copy link
Collaborator

@matthiasdiener matthiasdiener commented Feb 28, 2024

What do you think @inducer? This would not only make debugging cache misses easier, it could also be used to automate determinism and more general caching tests (by setting LOOPY_ABORT_ON_CACHE_MISS to something trueish, and rerunning the test).

TODO:

  • add for other cache loads?

Please squash

@matthiasdiener matthiasdiener self-assigned this Feb 28, 2024
@inducer
Copy link
Owner

inducer commented Feb 29, 2024

I'm not wildly opposed, but I feel that it's also easy to just hack a 1/0 into that code path when needed. And for automated determinism tests, there are probably simpler approaches than to ask the codegen cache.

Also, "abort on cache miss" sounds like it would include all caches, but this applies to only one of many caches in loopy.

@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Feb 4, 2025

@inducer If possible, I would like to revisit a way to make cache misses automatically testable.

I'm not wildly opposed, but I feel that it's also easy to just hack a 1/0 into that code path when needed. And for automated determinism tests, there are probably simpler approaches than to ask the codegen cache.

Do you have a suggestion on another approach? I thought of perhaps adding a way for persistent_dict itself to abort on a miss, but not sure if that would be better.

@inducer
Copy link
Owner

inducer commented Feb 4, 2025

Do you have a suggestion on another approach?

Um, "hack a 1/0 into that code path when needed." 😁

@matthiasdiener
Copy link
Collaborator Author

Do you have a suggestion on another approach?

Um, "hack a 1/0 into that code path when needed." 😁

That doesn't work for automated testing ;-)

@inducer
Copy link
Owner

inducer commented Feb 4, 2025

Ah, fair. 🤷 Let's do this then?

@inducer
Copy link
Owner

inducer commented Feb 4, 2025

Maybe add a comment to explain that the env var is for automated tests?

@@ -111,6 +111,7 @@ jobs:
. ./ci-support-v0
build_py_project_in_conda_env
( test_py_project )
export LOOPY_ABORT_ON_CACHE_MISS=1
( test_py_project )
Copy link
Collaborator Author

@matthiasdiener matthiasdiener Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inducer do you remember what the purpose of this pytest_twice CI check was/is? Is it just about seeing the time difference between the first pytest run and the second run?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's supposed to test that things still work with data pulled out of a cache, since the first round of tests runs on a cold cache by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that running pytest like this will lead to cache misses on the second run (demonstrated by the CI in this PR)? I've tried to go back to earlier loopy versions, and it seems like these cache misses are not caused by recent changes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I would not expect there to be cache misses on the second go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect code like this (from test_c_execution.py) is responsible for the cache misses:

def __get_kernel(order="C"):
    indices = ["i", "j", "k"]
    sizes = tuple(np.random.randint(1, 11, size=len(indices)))
    # ...

Would you prefer me to make these sizes constant across runs, or should we just ignore the cache miss?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a seed (and converting to the new-style numpy RNG interface) seems like the way to go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the multitude of issues encountered, it is probably not realistic to enable this in CI for the near future. I'll disable the export for now.

@matthiasdiener matthiasdiener marked this pull request as ready for review February 7, 2025 00:04
@inducer inducer merged commit a46bd53 into main Feb 12, 2025
18 checks passed
@inducer inducer deleted the abort-on-cache-miss branch February 12, 2025 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants