Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Changes to Improve Overall PushPull/MongoDB Performance #948

Open
krapie opened this issue Aug 2, 2024 · 2 comments
Open

Cache Changes to Improve Overall PushPull/MongoDB Performance #948

krapie opened this issue Aug 2, 2024 · 2 comments
Labels
enhancement 🌟 New feature or request

Comments

@krapie
Copy link
Member

krapie commented Aug 2, 2024

Description:

Currently, PushPull RPC always queries MongoDB, resulting in overhead. To reduce this overhead and improve the response time of PushPull RPC, we can consider implementing a caching mechanism for Changes or Snapshot data used in PushPull operations. These data have high locality due to the nature of CRDT use cases and are immutable, making them suitable for caching.

Why:

Implementing a cache for Changes or Snapshot data will optimize MongoDB performance and reduce response time for PushPull RPC operations. It will help improve overall system efficiency and user experience while working with PushPull functionalities.

@krapie krapie changed the title Cache Changes in Server in-memory for PushPull Cache Changes to Improve Overall PushPull/MongoDB Performance Aug 2, 2024
@krapie krapie added the enhancement 🌟 New feature or request label Aug 2, 2024
@binary-ho
Copy link
Contributor

binary-ho commented Aug 12, 2024

@krapie
Kevin, I'm interested in this issue
so, i want to conversation with you. do you have any idea in caching strategies?

  1. What is the primary goal?

    1. Reducing user response time
    2. Reducing the load on MongoDB
  2. What exactly are you caching?

    • Changes and Snapshots?
  3. When does caching occur?

    • I think caching could be based on WatchDoc (deleting documents after they are no longer watched).
    • However, to prevent immediate deletion even if the gRPC connection is temporarily lost, I believe a basic TTL (Time-to-Live) would be necessary.
    • Similar to GC, unnecessary change caches could be removed using min_synced_seq.
    • If the cache size exceeds its limit, we would need a more thoughtful expiration strategy. However, I believe a strategy that favors documents frequently used by a large number of users would be beneficial.
    • I think the real-time performance of the cache is crucial. As far as I know, locking is based on the API, so if caching occurs after the first WatchDoc request, subsequent requests should receive the cached results, which should prevent any issues.
  4. What strategies are you considering? if the goal is to reduce user response time, a local cache might be better

    • Global Cache:

      • This could reduce the load on MongoDB or the time spent retrieving data from the server where the data is stored.
      • However, the communication time between servers would likely remain similar.
      • Storing all Changes and Snapshots in a single server's memory could consume a lot of memory.
      • If the main bottleneck is communication time, caching might be ineffective.
      • Nevertheless, this could be easier to implement and might be suitable if the primary goal is to reduce the load on the MongoDB application.
    • Local Cache:

      • With no external IO, user response time could be significantly faster.
      • Data is stored in the cache and periodically pushed to the database. (In this design, some changes could be lost if the server unexpectedly shuts down.)
      • If based on WatchDoc, a Thundering Herd problem could occur during server additions or downs, leading to split-brain scenarios. (However, this may not be a critical issue depending on the number of users.)

@krapie
Copy link
Member Author

krapie commented Aug 12, 2024

@binary-ho Well, this issue is just a conceptual thought that I have, and I think we need to discuss about the necessity and the benefits of this feature. But this is a very fun issue to discuss, so maybe starting with PoC might do.

For your questions:

  1. What is the primary goal?

Reducing the load of MongoDB is the primary goal, but I'm expecting reduced response time as well.

  1. What exactly are you caching?

Primary Changes, but need to check MongoDB query pattern.

  1. When does caching occur?

About the caching strategy, we need to brainstorm about it. I do not have any ideas for now.

  1. What strategies are you considering? if the goal is to reduce user response time, a local cache might be better

I'm considering local in-memory caching because we do not need global caching in cluster mode, which do not share document workload across the servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🌟 New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

2 participants