Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the query planner options to the query plan cache key #5100

Merged
merged 23 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .changesets/fix_geal_cache_key_qp_options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
### Prevent query plan cache collision when planning options change ([Issue #5093](https://github.com/apollographql/router/issues/5093))

> [!IMPORTANT]
> If you have enabled [Distributed query plan caching](https://www.apollographql.com/docs/router/configuration/distributed-caching/#distributed-query-plan-caching), this release changes the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

When query planning takes place there are a number of options such as:
* `defer_support`
* `generate_query_fragments`
* `experimental_reuse_query_fragments`
* `experimental_type_conditioned_fetching`
* `experimental_query_planner_mode`

that will affect the generated query plans.

If distributed query plan caching is also enabled, then changing any of these will result in different query plans being generated and entering the cache.

This could cause issue in the following scenarios:
1. The Router configuration changes and a query plan is loaded from cache which is incompatible with the new configuration.
2. Routers with differing configuration are sharing the same cache causing them to cache and load incompatible query plans.

Now a hash for the entire query planner configuration is included in the cache key to prevent this from happening.

By [@Geal](https://github.com/Geal) in https://github.com/apollographql/router/pull/5100
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ commands:
environment:
# Use the settings from the "ci" profile in nextest configuration.
NEXTEST_PROFILE: ci
command: xtask test --workspace --locked
command: xtask test --workspace --locked --features ci
- run:
name: Delete large files from cache
command: |
Expand Down
3 changes: 3 additions & 0 deletions apollo-router/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ docs_rs = ["router-bridge/docs_rs"]
# and not yet ready for production use.
telemetry_next = []

# is set when ci builds take place. It allows us to disable some tests when CI is running on certain platforms.
ci = []

[package.metadata.docs.rs]
features = ["docs_rs"]

Expand Down
21 changes: 21 additions & 0 deletions apollo-router/src/configuration/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,27 @@ impl Configuration {
.build()
).build())
}

pub(crate) fn js_query_planner_config(&self) -> router_bridge::planner::QueryPlannerConfig {
router_bridge::planner::QueryPlannerConfig {
reuse_query_fragments: self.supergraph.reuse_query_fragments,
generate_query_fragments: Some(self.supergraph.generate_query_fragments),
incremental_delivery: Some(router_bridge::planner::IncrementalDeliverySupport {
enable_defer: Some(self.supergraph.defer_support),
}),
graphql_validation: false,
debug: Some(router_bridge::planner::QueryPlannerDebugConfig {
bypass_planner_for_single_subgraph: None,
max_evaluated_plans: self
.supergraph
.query_planning
.experimental_plans_limit
.or(Some(10000)),
paths_limit: self.supergraph.query_planning.experimental_paths_limit,
}),
type_conditioned_fetching: self.experimental_type_conditioned_fetching,
}
}
}

impl Default for Configuration {
Expand Down
25 changes: 1 addition & 24 deletions apollo-router/src/query_planner/bridge_query_planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,9 @@ use futures::future::BoxFuture;
use opentelemetry_api::metrics::MeterProvider as _;
use opentelemetry_api::metrics::ObservableGauge;
use opentelemetry_api::KeyValue;
use router_bridge::planner::IncrementalDeliverySupport;
use router_bridge::planner::PlanOptions;
use router_bridge::planner::PlanSuccess;
use router_bridge::planner::Planner;
use router_bridge::planner::QueryPlannerConfig;
use router_bridge::planner::QueryPlannerDebugConfig;
use router_bridge::planner::UsageReporting;
use serde::Deserialize;
use serde_json_bytes::Map;
Expand Down Expand Up @@ -159,27 +156,7 @@ impl PlannerMode {
configuration: &Configuration,
old_planner: Option<Arc<Planner<QueryPlanResult>>>,
) -> Result<Arc<Planner<QueryPlanResult>>, ServiceBuildError> {
let query_planner_configuration = QueryPlannerConfig {
reuse_query_fragments: configuration.supergraph.reuse_query_fragments,
generate_query_fragments: Some(configuration.supergraph.generate_query_fragments),
incremental_delivery: Some(IncrementalDeliverySupport {
enable_defer: Some(configuration.supergraph.defer_support),
}),
graphql_validation: false,
debug: Some(QueryPlannerDebugConfig {
bypass_planner_for_single_subgraph: None,
max_evaluated_plans: configuration
.supergraph
.query_planning
.experimental_plans_limit
.or(Some(10000)),
paths_limit: configuration
.supergraph
.query_planning
.experimental_paths_limit,
}),
type_conditioned_fetching: configuration.experimental_type_conditioned_fetching,
};
let query_planner_configuration = configuration.js_query_planner_config();
let planner = match old_planner {
None => Planner::new(sdl.to_owned(), query_planner_configuration).await?,
Some(old_planner) => {
Expand Down
58 changes: 55 additions & 3 deletions apollo-router/src/query_planner/caching_query_planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ use rand::seq::SliceRandom;
use rand::thread_rng;
use router_bridge::planner::PlanOptions;
use router_bridge::planner::Planner;
use router_bridge::planner::QueryPlannerConfig;
use router_bridge::planner::UsageReporting;
use serde::Serialize;
use sha2::Digest;
use sha2::Sha256;
use tower::BoxError;
Expand Down Expand Up @@ -50,6 +52,15 @@ pub(crate) type Plugins = IndexMap<String, Box<dyn QueryPlannerPlugin>>;
pub(crate) type InMemoryCachePlanner =
InMemoryCache<CachingQueryKey, Result<QueryPlannerContent, Arc<QueryPlannerError>>>;

#[derive(Debug, Clone, Hash, PartialEq, Eq, Serialize)]
pub(crate) enum ConfigMode {
//FIXME: add the Rust planner structure once it is hashable and serializable,
// for now use the JS config as it expected to be identical to the Rust one
Rust(Arc<QueryPlannerConfig>),
Both(Arc<QueryPlannerConfig>),
Js(Arc<QueryPlannerConfig>),
}

/// A query planner wrapper that caches results.
///
/// The query planner performs LRU caching.
Expand All @@ -62,6 +73,8 @@ pub(crate) struct CachingQueryPlanner<T: Clone> {
schema: Arc<Schema>,
plugins: Arc<Plugins>,
enable_authorization_directives: bool,
config_mode: ConfigMode,
introspection: bool,
}

impl<T: Clone + 'static> CachingQueryPlanner<T>
Expand Down Expand Up @@ -90,12 +103,26 @@ where

let enable_authorization_directives =
AuthorizationPlugin::enable_directives(configuration, &schema).unwrap_or(false);

let config_mode = match configuration.experimental_query_planner_mode {
crate::configuration::QueryPlannerMode::New => {
ConfigMode::Rust(Arc::new(configuration.js_query_planner_config()))
}
crate::configuration::QueryPlannerMode::Legacy => {
ConfigMode::Js(Arc::new(configuration.js_query_planner_config()))
}
crate::configuration::QueryPlannerMode::Both => {
ConfigMode::Both(Arc::new(configuration.js_query_planner_config()))
}
};
Ok(Self {
cache,
delegate,
schema,
plugins: Arc::new(plugins),
enable_authorization_directives,
config_mode,
introspection: configuration.supergraph.introspection,
})
}

Expand Down Expand Up @@ -141,7 +168,9 @@ where
hash,
metadata,
plan_options,
..
config_mode: _,
sdl: _,
introspection: _,
},
_,
)| WarmUpCachingQueryKey {
Expand All @@ -150,6 +179,8 @@ where
hash: Some(hash.clone()),
metadata: metadata.clone(),
plan_options: plan_options.clone(),
config_mode: self.config_mode.clone(),
introspection: self.introspection,
},
)
.take(count)
Expand Down Expand Up @@ -181,6 +212,8 @@ where
hash: None,
metadata: CacheKeyMetadata::default(),
plan_options: PlanOptions::default(),
config_mode: self.config_mode.clone(),
introspection: self.introspection,
});
}
}
Expand All @@ -195,6 +228,8 @@ where
hash,
metadata,
plan_options,
config_mode: _,
introspection: _,
} in all_cache_keys
{
let context = Context::new();
Expand All @@ -210,6 +245,8 @@ where
sdl: Arc::clone(&self.schema.raw_sdl),
metadata,
plan_options,
config_mode: self.config_mode.clone(),
introspection: self.introspection,
};

if experimental_reuse_query_plans {
Expand Down Expand Up @@ -391,6 +428,8 @@ where
sdl: Arc::clone(&self.schema.raw_sdl),
metadata,
plan_options,
config_mode: self.config_mode.clone(),
introspection: self.introspection,
};

let context = request.context.clone();
Expand Down Expand Up @@ -530,8 +569,13 @@ pub(crate) struct CachingQueryKey {
pub(crate) hash: Arc<QueryHash>,
pub(crate) metadata: CacheKeyMetadata,
pub(crate) plan_options: PlanOptions,
pub(crate) config_mode: ConfigMode,
pub(crate) introspection: bool,
}

// Update this key every time the cache key or the query plan format has to change.
// When changed it MUST BE CALLED OUT PROMINENTLY IN THE CHANGELOG.
const CACHE_KEY_VERSION: usize = 0;
const FEDERATION_VERSION: &str = std::env!("FEDERATION_VERSION");

impl std::fmt::Display for CachingQueryKey {
Expand All @@ -545,23 +589,29 @@ impl std::fmt::Display for CachingQueryKey {
hasher.update(
&serde_json::to_vec(&self.plan_options).expect("serialization should not fail"),
);
hasher
.update(&serde_json::to_vec(&self.config_mode).expect("serialization should not fail"));
hasher.update(&serde_json::to_vec(&self.sdl).expect("serialization should not fail"));
hasher.update([self.introspection as u8]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is introspection added here but not in the Hash implementation below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch: 43ccd9d

I am wondering though why we are using a manual implementation of Hash rather than using derive? There seems to be nothing special about this structure that would warrant a custom impl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to dig it up, but I think there was something in the key that could not implement Hash in the past

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we figure that out, is there somewhere we could add a comment to make that more intuitive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need a custom implementation now let's remove it and use derive. That way we don't have to worry about it in the future. If we do eventually need a custom version would need to comment and possibly fuzz test.

let metadata = hex::encode(hasher.finalize());

write!(
f,
"plan:{}:{}:{}:{}",
FEDERATION_VERSION, self.hash, operation, metadata,
"plan:{}:{}:{}:{}:{}",
CACHE_KEY_VERSION, FEDERATION_VERSION, self.hash, operation, metadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unrelated to this particular PR, but it took me a while to understand that operation is a hash of self.operation, which is actually an operation name.

Also, self.hash is hash of query and relevent parts of schema, but taking into account only operation matching operation_name.
At the same time, self.hash doesn't include a hash of "operation_name" itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also self.metadata is actually "authorization metadata".

P.S. Same as the previous comment, this is unrelated to this PR.
Just comments for others to also understand what is happening in this code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this refactored and keys become self describing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When/where does this change get itemized/made?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR: Why are operation and metadata separate vs combined in the same hasher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was a way to see how many variants there can be in cache for a single query

)
}
}

impl Hash for CachingQueryKey {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.sdl.hash(state);
self.hash.0.hash(state);
self.operation.hash(state);
self.metadata.hash(state);
self.plan_options.hash(state);
self.config_mode.hash(state);
self.introspection.hash(state);
}
}

Expand All @@ -572,6 +622,8 @@ pub(crate) struct WarmUpCachingQueryKey {
pub(crate) hash: Option<Arc<QueryHash>>,
pub(crate) metadata: CacheKeyMetadata,
pub(crate) plan_options: PlanOptions,
pub(crate) config_mode: ConfigMode,
pub(crate) introspection: bool,
}

#[cfg(test)]
Expand Down
Loading
Loading