Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cache_info #43

Merged
merged 3 commits into from
Dec 9, 2023
Merged

Conversation

ericphanson
Copy link
Member

closes #28

I think this should have minimal performance impact, since we simply increment an integer on hits/misses. We are already in the lock in each case, so we don't need to take it.

Some choices:

  • I reset the stats on empty!, like how functools clears both entries and stats on empty_cache().
  • I only count hits/misses on get and get! since those are on the only viable methods to use for caching. e.g. setindex! doesn't care about the existing value so it doesn't make sense to count it as a hit/miss. And getindex errors if the value is not present, so it doesn't really seem like that should count as a cache miss (rather, it is exceptional behavior). This aligns with the README documentation that only describes get and get! as fulfilling a caching interface.
  • I use a similar show method to functools, since it clearly communicates how to access the entries. I switched the order slightly compared to functools (currentsize before maxsize because it makes more sense to me).
  • cache_info provides an immutable snapshot rather than a reference to the cache. This way there isn't a need for locking on access to to the cache info fields.
  • I use "properties" rather than "fields" in the docstring/documentation so if needed we can change the fields while exposing getproperty overloads for api compatibility. I don't forsee that we would need this though.

To check it is correct according to this api, check that every branch of get and get! either increments a hit or a miss, and does so within the lock. (I believe I've done so).

@ericphanson
Copy link
Member Author

BTW, functools can't really help us decide what should count as a hit/miss (just get/get! as I've done here, or also other methods), because functools only supports a @memoize-type function-wrapping approach. So the only api to access the cache with functools is calling the function, meaning essentially get!.

@ericphanson ericphanson requested a review from Jutho November 21, 2023 11:32
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (0d2635d) 77.06% compared to head (534f84e) 78.45%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
+ Coverage   77.06%   78.45%   +1.39%     
==========================================
  Files           2        2              
  Lines         279      297      +18     
==========================================
+ Hits          215      233      +18     
  Misses         64       64              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ericphanson ericphanson mentioned this pull request Nov 22, 2023
@ericphanson
Copy link
Member Author

ericphanson commented Dec 2, 2023

I think the case get/get! doesn't cover is e.g. if someone writes out the definition of those:

function weird_cache_user(cache, x)
    if haskey(cache, x)
          return cache[x]
    else
         cache[x] = f(x)
    end
end

Here, we are clearly using cache as a cache, and the outcome of this function "should" be a hit if x was in the cache and a miss otherwise. To actually implement that, one could either count a hit/miss from haskey, or a hit from getindex and a miss from setindex!. The latter seems problematic as one could imagine setindex being used to update a cached value, or force a value into cache, which shouldn't really be a miss.

I think having haskey act as a hit/miss could make sense, but it also is a bit inintuitively-mutating for haskey. One could imagine using haskey to check cache status or something, in a way one would not want hit/miss counts.

Therefore, I think having get / get! being the only functions that update hits/misses still makes sense, since those seem like the only times we reliably know if it was actually a cache hit/miss vs something else.

That's really the only design question I had open, and I've managed to convince myself it is right 😅. So then there's the question of "do we want this feature", and from #28 it sounds like the answer to yes. And there's the question "is it implemented correctly", and I checked again and indeed every hit/miss is inside the lock, and every return path of get and get! increments hit or miss.

So therefore I think we should reverse the inertia here, and I'll say if there's no objection within a week, I'll merge & tag this. (if there is review before then, much appreciated!).

@Jutho
Copy link
Collaborator

Jutho commented Dec 9, 2023

I believe this can be merged and the new version can be tagged. Thanks for all this nice work @ericphanson.

@ericphanson ericphanson merged commit 3125353 into JuliaCollections:master Dec 9, 2023
12 checks passed
@ericphanson ericphanson deleted the eph/cache-info branch December 9, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CacheInfo reporting
3 participants