Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: support insert key with expire time? #231

Closed
Millione opened this issue Feb 17, 2023 · 17 comments · Fixed by #248
Closed

Proposal: support insert key with expire time? #231

Millione opened this issue Feb 17, 2023 · 17 comments · Fixed by #248
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Millione
Copy link

Now moka has a global expire time for all keys, can we insert key with expire time to make some keys different?

@tatsuya6502
Copy link
Member

No. It is not supported yet. It is on our roadmap, and I will schedule it for v0.12.0.

@tatsuya6502 tatsuya6502 added the enhancement New feature or request label Feb 17, 2023
@tatsuya6502 tatsuya6502 added this to the v0.12.0 milestone Feb 17, 2023
@tatsuya6502
Copy link
Member

@Millione

I forgot to mention, but until the per-key TTL is implemented, you could workaround by using entry_by_ref(&key).or_insert_with_if(init, replace_if) (doc).

#123

I'm building a cache whose entries have their own standalone TTL. However moka supports global TTL and IDLE only. So I'm here to propose a new method Cache::get_with_if(&self, key: &K, fallback: Fut, cond: Fn(&V) -> bool) -> V

The function works like Cache::get_with(&self, key: &K, init: Fut) -> V, but when the key does not exist or cond(&V) is true, the init clojure will be called to initialize the value.

Note that the Cache::get_with_if has been rename to or_insert_with_if under the Entry API.

You may want to check an actual usage by another user: #227 (comment)

A difference between the real per-key TTL and this workaround is:

  • The real per-key TTL let an expired entry to be evicted as soon as possible (when periodical housekeeping task is executed).
  • The workaround never removes expired entry but replaces it with a new value when its key is accessed by entry or_insert_with_if.

@Millione
Copy link
Author

Got it, thanks. Maybe I can try to implement the real per-key TTL when I'm available.

@Millione
Copy link
Author

@tatsuya6502 I have found that the workarounds are not suitable for my scenario. When the cache is full, new keys cannot be inserted because expired keys cannot be cleared in time. Does it take a lot of work to implement the real per key? Desperately needed!

@tatsuya6502
Copy link
Member

tatsuya6502 commented Mar 25, 2023

@Millione

When the cache is full, new keys cannot be inserted because expired keys cannot be cleared in time.

That is right.

Another workaround is to use invalidate_entries_if method (doc). It will scan all entries in the cache in background, and remove them when the predicate returns true. You could call it periodically (e.g every 5 minutes) to remove expired entries.

Does it take a lot of work to implement the real per key? Desperately needed!

I would say it will take a week or two if I can work on it 2 hours a day after work.

  1. Implement the per key TTL:
    • a. Implement a new data structure called hierarchical timing wheels (pdf) to manage the per key TTL. (e.g. Caffeine's TimerWheel.java)
    • b. Add test cases for the new data structure.
    • c. Update sync and future caches to use the new data structure for:
      • cache write operations such as insert and invalidate.
      • finding and evicting expired entries.
    • d. Add test cases for the per key TTL.
  2. Write the documentation for the per key TTL with sample codes.
  3. Update the benchmarking tool and run it.

I was a bit busy recently and could not done anything for Moka in the past few weeks. I am hoping to get back to it soon, but I still have other things to do in Moka before I can start working on this.

I will let you know when I make some progress, e.g. when I finish 1-a to 1-d so you can start to play with it.

@tatsuya6502 tatsuya6502 modified the milestones: v0.12.0, v0.11.0 Mar 25, 2023
@Swatinem
Copy link
Contributor

I’m very excited to see progress here. Is there anything I might be able to help with?

We are using the workaround right now that we have an Instant as part of the cache item, and check for that in the replace_if callback.

One very interesting this about that is that the cache performance / hit rate is slowly decreasing over time, which is clearly visible in cache metrics:
image
Here, the light blue is cache hit, vs medium blue cache miss. And the dent is a fresh deployment, at which we have a good cache hit rate, it then gets worse over the next days or so. No idea why its getting worse so slowly though. We have a hard 1-hour TTL on every entry, so I’m surprised this effect needs a day or so to manifest.

@tatsuya6502
Copy link
Member

@Swatinem

One very interesting this about that is that the cache performance / hit rate is slowly decreasing over time, which is clearly visible in cache metrics:

Does your workload have a high insertion rate all the time (e.g. more than 1500 insertions/sec)? and also does the cache have the max capacity configured? If so, eviction rate (for evicting expired entries) may not catch up with the insertion rate, and the cache will eventually become full although there are expired entries. Once the cache is full, the hit rate can drop if the admission policy is not suitable for your workload.

You could check if this is the case by tracking entry_count (if weigher is not set) or weighted_size (if weigher is set) in the cache metrics.

As of Moka v0.10.0, the eviction rate is hard-corded and maybe it is too small. Maybe I will make it larger and/or configurable.

In the future, I think the cache should be able to calculate the eviction rate based on the insertion rate and the cache capacity.

@Swatinem
Copy link
Contributor

I don’t have metrics on the entry_count or weighted_size. We get about ~200 reads/sec and depending on the hit rate around ~50-150 misses (aka writes). Considering the capacity is only around ~500, I would assume it has a lot of pressure.

The capacity is intentionally conservative, as we are storing open mmaps, which both keep a file-descriptor open and, depending on the highly variable file size (a few kilobytes to gigabytes), might contribute a lot to virtual memory usage. It might sure be possible to make that adaptive on the actual size of the mmaped files, which I believe should give us a little better control over the capacity and vmem size.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Mar 29, 2023

We get about ~200 reads/sec and depending on the hit rate around ~50-150 misses (aka writes). Considering the capacity is only around ~500, I would assume it has a lot of pressure.

I see. So your cache will be full in a few seconds, and having slow eviction rate of expired entries will not explain why the performance / hit rate is slowly decreasing over time.

Other aspects that will change the hit rate will be:

  • Write Rate
    • Higher write rate will increase the hit rate, because it makes the cache to temporary hold more entries than the max capacity.
      • Moka cache applies writes to the cache policy in a batch. Every 300 ms or when the batch size reaches 512. Until the batch is applied, the cache will hold more entries than the max capacity.
      • e.g. If the max capacity is 500, and the write rate is 150/sec, the cache will hold 550 entries for 300 ms, and then evict 50 entries. (10% over capacity)
      • If the write rate is 50/sec, the cache will hold ~517 entries for 300 ms, and then evict 17 entries. (3% over capacity)
  • Read Access Pattern
    • If reads pattern prefers popular keys, the hit rate will be higher because of the TinyLFU policy.
    • If reads pattern prefers recently inserted (but unpopular) keys, the hit rate will be lower.
      • This will be improved when Window TinyLFU policy is implemented.
    • It will be very possible that access pattern changes over time.

With the current information, it will be hard to identify the root cause. It will be very helpful if we can get access trace data from the production system. Access trace data can be a CSV file, and each line should have the following information:

  • Duration (milliseconds) since start.
  • Key, or hash of the key.
  • Per-key TTL.
  • If you use weigher, the weighted size of the entry.

Once we have such trace, we can replay it to reproduce the issue and investigate the root cause.

@tatsuya6502
Copy link
Member

FYI, I started to work on this feature a few days ago: #248

I am implementing hierarchical timing wheels now.

  • a. Implement a new data structure called hierarchical timing wheels (pdf) to manage the per key TTL. (e.g. Caffeine's TimerWheel.java)

@tatsuya6502 tatsuya6502 mentioned this issue Mar 31, 2023
11 tasks
@phantomhker
Copy link

phantomhker commented Mar 31, 2023

I’m very excited to see progress here. Is there anything I might be able to help with?

We are using the workaround right now that we have an Instant as part of the cache item, and check for that in the replace_if callback.

One very interesting this about that is that the cache performance / hit rate is slowly decreasing over time, which is clearly visible in cache metrics: image Here, the light blue is cache hit, vs medium blue cache miss. And the dent is a fresh deployment, at which we have a good cache hit rate, it then gets worse over the next days or so. No idea why its getting worse so slowly though. We have a hard 1-hour TTL on every entry, so I’m surprised this effect needs a day or so to manifest.

Have u ever solved the problem? I had a similar problem recently, the difference is that the hit rate in my scene dropped faster, in about half an hour after reaching the max capacity.

@Swatinem
Copy link
Contributor

No, this is not that critical for us, and we are very happy with how the cache is performing in general. I will rather wait for proper per-entry TTL support instead of putting more effort into workarounds for that.

@tatsuya6502 tatsuya6502 self-assigned this Mar 31, 2023
@tatsuya6502 tatsuya6502 moved this to In Progress in Moka — Roadmap Mar 31, 2023
@tatsuya6502
Copy link
Member

tatsuya6502 commented Mar 31, 2023

@phantomhker @Swatinem @Millione

Just FYI, after finish implementing this feature (per-entry TTL), I will implement Window-TinyLFU admission/eviction policy, which would improve cache hit rate for some kinds of workloads.

Other aspects that will change the hit rate will be:

  • Write Rate
    • ...
  • Read Access Pattern
    • If reads pattern prefers popular keys, the hit rate will be higher because of the TinyLFU policy.
    • If reads pattern prefers recently inserted (but unpopular) keys, the hit rate will be lower.
      • This will be improved when Window TinyLFU policy is implemented.
    • It will be very possible that access pattern changes over time.

I created #249 to track the development of Window-TinyLFU. I also updated the roadmap.

@tatsuya6502
Copy link
Member

And the dent is a fresh deployment, at which we have a good cache hit rate, it then gets worse over the next days or so.

Maybe you are doing blue/green deployment and traffic will be slowly shifted to the fresh instance during a day? (e.g. AWS CodeDeploy (doc)) Having more traffic could decrease the hit rate as more cache keys will be used by different requests. It will have the same effect to reduce the cache capacity.

@tatsuya6502
Copy link
Member

FYI, I started to work on this feature a few days ago: ...

OK. It is not quite ready for a release yet, but you can start to play with it. See: #248 (comment)

@github-project-automation github-project-automation bot moved this from In Progress to Done in Moka — Roadmap Apr 23, 2023
@tatsuya6502
Copy link
Member

Just FYI, I just published moka v0.11.0 to crates.io.

@tatsuya6502
Copy link
Member

Hi @phantomhker,

Have u ever solved the problem? I had a similar problem recently, the difference is that the hit rate in my scene dropped faster, in about half an hour after reaching the max capacity.

FYI, @Swatinem reported the problem they were observing was fixed by #348 for v0.12.2.

You might be affected by the same issue, so please try the latest version (v0.12.4) if you have a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants