Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented and non-configurable memoization feature on apollo-datasource-rest #6603

Closed
Jaxolotl opened this issue Jun 22, 2022 · 4 comments · Fixed by #6650 or apollographql/datasource-rest#3

Comments

@Jaxolotl
Copy link

Package name and version

apollo-datasource-rest V3.6.1

Expected behavior

  • I'm able to control whether to enable memoization or not
  • If I opt for not using memoization then all my GET requests are executed
  • I expect to find the behavior documented somewhere (on the readme file and/or on the apollo server "datasources" documentation
  • I am able to specifically decide how to cache the requests (e.g headers, redis, etc)

Actual behavior

  • All GET requests will be memoized no matter you want it or not
  • I'm unable to opt out from memoization
  • No documentation is available, you have to dig into the source code to find it or accidentally find a clue on a Khalil Stemmler's blog
  • The package is adding a hidden non configurable caching layer bypassing any other standard caching mechanism

Description of the case

The datasource will keep a memoized version of all the GET requests until a POST request is made using the same URI or you programmatically delete the entry key (this.memoizedResults.delete(<MY_KEY>)) or clear all the entries (this.memoizedResults.clear()).

Not being documented and not having and opt-out configuration for the feature will cause a data inconsistency problem whenever you only make get requests to a system you have just read-only access (if another system updates the data you will never get the new data)

temporary workaround

You can always do the following to workaround the problem on your datasource derived class but is not a solution

export abstract class MyExtendedRestDataSource extends RESTDataSource {
  constructor() {
    super();
  }

  override willSendRequest(request: RequestOptions) {
    this.memoizedResults.clear(); // brute-force clearing the entries before every request
  }
}

Steps to repro

The package code is simple enough and self explanatory to be able to repro with a unit test.

Extra notes

I saw a couple tickets that might be related to this problem but they were described as generic caching issues without providing a detailed explanation of the problem. Hope you find it useful

apollographql/datasource-rest#72
apollographql/datasource-rest#46

@smyrick
Copy link
Member

smyrick commented Jun 22, 2022

Thanks for debugging and finding the issue @Jaxolotl! I have shared with the Apollo Server team too, but I will share my ideas of how we could solve this with a new minor release. Let me know what you think

Create a new config/class property

For backwards compatibility, we could add a new property called cacheGetRequests, similar to how baseURL works today, which defaults to true as this is the current behavior. Then in the code block below you pointed out, we could update the code to be:

if (request.method === 'GET' && cacheGetRequests !== false) {
    let promise = this.memoizedResults.get(cacheKey);
    // ....
}

This will hopefully work still even if some one has created their own implementation of RESTDataSource with a property also called cacheGetRequests and set to some object or other value.

That means then to disable memoized results you could do this in the constructor.

class PersonalizationAPI extends RESTDataSource {
  constructor() {
    super();
    this.cacheGetRequests = false;
  }
}

Add another hook point

Today we have a few hook points: willSendRequest, didReceiveResponse, and didEncounterError. We can update to add a new one maybe called shouldCacheRequest which has a default implementation of the current logic that you could then override

protected shouldCacheRequest(request: RequestOptions): boolean {
  return request.method === 'GET';
}

//....
  if (shouldCacheRequest()) {
      let promise = this.memoizedResults.get(cacheKey);
     // ....
  }

@Jaxolotl
Copy link
Author

Jaxolotl commented Jun 23, 2022

Hi @smyrick thank you for the quick reply!
Here my 2 cents:

for the hook points:

I'm not really sure how opening the cache implementation for extension/overriding would impact on the project. Personally I wouldn't go in that direction and cannot imagine a scenario where caching a non query verbs ( command-like verb like PUT, POST, DELETE) would benefit from caching

for the config value:

That sounds less intrusive, although I'd suggest something like this:

    if (cacheGetRequests !== false) {
        return performRequest();
    }

    if (request.method === 'GET') {
      let promise = this.memoizedResults.get(cacheKey);
      if (promise) return promise;

      promise = performRequest();
      this.memoizedResults.set(cacheKey, promise);
      return promise;
    } else {
      this.memoizedResults.delete(cacheKey);
      return performRequest();
    }

That way we do an earlier return with a simpler condition easy to test and avoiding dealing with the if/else statements. If we add it to the if (request.method === 'GET' && cacheGetRequests !== false) { then we should deal with the else deleting a non existing key from the cache all the time

extra thoughts

I think the feature has a very valid purpose and it would be great to have a better control over it. Switching it on and off might not be a good approach, what if we add a time value, e.g. if 2 GET requests with the same signature have less than 500ms difference then return the memoized promise, otherwise make the request and update the memoized entry, that way you can switched on/off AND you can switched on and define a max life span.

// having
this.cacheGetRequests = true; // defaulting to true
this.cacheMaxLifeSpan = 500; // defaulting to 500 ms to avoid multiple douplicated hits during for example a nested query but not catching separated operation calls

Any thoughts?

@smyrick
Copy link
Member

smyrick commented Jun 28, 2022

I like the idea of adding more control to the caching features but I also want to make sure we don't change the existing behavior of the library. Even if we agree it needs to be updated, having the results potential flushed from the cache would be a breaking change in logic (and require a major version) and someone today might be using this as is.

I totally missed this before but there already is an open in the cache set to pass in a ttl option which just compares to Date.now(). So if we made that open to pass in we could customize that:

Default

// Class props
this.cacheEnabled = true;
this.cacheTTL = null;

// Logic
if (this.cacheEnabled) {
  let promise = this.memoizedResults.get(cacheKey);
  if (promise) return promise;

  if (request.method === 'GET') {
    promise = performRequest();
    this.memoizedResults.set(cacheKey, promise, { ttl: cacheTTL });
    return promise;
  } else {
    this.memoizedResults.delete(cacheKey);
    return performRequest();
  }
} else {
  return performRequest();
}

trevor-scheer pushed a commit that referenced this issue Aug 19, 2022
By default, RESTDataSource caches all outgoing GET requests in a separate
memoized cache from the regular response cache. It makes the assumption that all
responses from HTTP GET calls are cacheable by their URL.

If a request is made with the same cache key (URL by default) but with an HTTP
method other than GET, the cached request is then cleared.

This change adds a new class property requestCacheEnabled (which defaults to
true to match current behavior) which allows users to disable the cache. You might
want to do this if your API is not actually cacheable or your data changes over time.

Separately it documents this feature exists and adds more info about how to set a
TTL for the entire HTTP cache.

Fixes #6603
@smyrick
Copy link
Member

smyrick commented Aug 22, 2022

@Jaxolotl This is now released in apollo-datasource-rest 3.7

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants