-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add $APOLLO_ROUTER_COMPUTE_THREADS environment variable #6746
Conversation
Router creates a thread pool dedicated to "compute jobs", separate from Tokio threads. By default this pool has as many threads as available CPUs. This can now be changed by setting this new variable to an integer value. This feature is intentionally undocumented and should be considered "experimental". Also rename the pre-existing (also undocumented) $APOLLO_ROUTER_NUM_CORES to $APOLLO_ROUTER_IO_THREADS to better reflect what it does. Drive-by unrelated change: make `cargo run` default to the main router executable.
@SimonSapin, please consider creating a changeset entry in |
CI performance tests
|
✅ Docs preview has no changesThe preview was not built because there were no changes. Build ID: 2cde78877e90d97611812265 |
apollo-router/src/compute_job.rs
Outdated
let size = std::thread::available_parallelism() | ||
.expect("available_parallelism() failed") | ||
.get(); | ||
tracing::debug!("Compute thread pool size: available_parallelism() = {size}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log can also help makes sure cgroup limits affect available_parallelism()
like we expect when benchmarking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing::info!(size, type="compute", source="calculated", "thread pool");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting up strings in logs makes them harder to consume or work with from JSON logs.
apollo-router/src/compute_job.rs
Outdated
.ok() | ||
.and_then(|value| value.parse::<usize>().ok()) | ||
{ | ||
tracing::debug!("Compute thread pool size: $APOLLO_ROUTER_COMPUTE_THREADS = {size}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better format for JSON parseability:
tracing::info!(size, type="compute", source="env", "thread pool");
I'd make it info since it will not be flooding the logs and it seems like useful information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
apollo-router/src/compute_job.rs
Outdated
let size = std::thread::available_parallelism() | ||
.expect("available_parallelism() failed") | ||
.get(); | ||
tracing::debug!("Compute thread pool size: available_parallelism() = {size}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing::info!(size, type="compute", source="calculated", "thread pool");
|
||
// This environment variable is intentionally undocumented. | ||
// See also APOLLO_ROUTER_COMPUTE_THREADS in apollo-router/src/compute_job.rs | ||
if let Some(nb) = std::env::var("APOLLO_ROUTER_IO_THREADS") | ||
.ok() | ||
.and_then(|value| value.parse::<usize>().ok()) | ||
{ | ||
builder.worker_threads(nb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth logging these details as well:
tracing::info!(size=nb, type="io", source="env", "thread pool");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can tracing
be used before we create a Tokio runtime? (If not, I can move things around to log after we’re in async context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I added this log it was captured in the snapshot for test pq_layer_freeform_graphql_with_safelist_log_unknown_true
which made the outcome of the test dependent of number of available CPUs. Rather than spend time on figuring out how to make that test filter out that specific log I’ve removed the log for now.
apollo-router/src/executable.rs
Outdated
@@ -677,6 +681,11 @@ impl Executable { | |||
} | |||
}; | |||
|
|||
let threads = tokio::runtime::Handle::current().metrics().num_workers(); | |||
tracing::info!(threads, type="io", "thread pool"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done here after we’ve initialized enough of the logging infrastructure. Unlike for compute threads there is no source
here. The source could be:
$APOLLO_ROUTER_IO_THREADS
, if specified andapollo_router::executable::main
is used- Anything, if a custom binary defines its own
main
creating a Tokio runtime $TOKIO_WORKER_THREADS
if specified and nothing set Tokio’sworker_threads()
available_parallelism()
is the eventual default
apollo-router/src/compute_job.rs
Outdated
.expect("available_parallelism() failed") | ||
.get() | ||
// This environment variable is intentionally undocumented. | ||
// See also APOLLO_ROUTER_IO_THREADS in apollo-router/src/executable.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// See also APOLLO_ROUTER_IO_THREADS in apollo-router/src/executable.rs |
apollo-router/src/compute_job.rs
Outdated
} | ||
|
||
type Job = Box<dyn FnOnce() + Send + 'static>; | ||
|
||
fn queue() -> &'static AgeingPriorityQueue<Job> { | ||
pub(crate) fn queue() -> &'static AgeingPriorityQueue<Job> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deos this need to be pub(crate)
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points!
apollo-router/src/compute_job.rs
Outdated
tracing::info!(threads, type="compute", source="env", "thread pool"); | ||
threads | ||
} else { | ||
let threads = std::thread::available_parallelism() | ||
.expect("available_parallelism() failed") | ||
.get(); | ||
tracing::info!(threads, type="compute", source="available_parallelism", "thread pool"); | ||
threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we emit a metric for the number of threads available and the number of threads being set by the env var?
(cherry picked from commit 768f6d4)
Router creates a thread pool dedicated to "compute jobs" to avoid blocking Tokio threads. By default this pool has as many threads as available CPUs. This can now be changed by setting the
$APOLLO_ROUTER_COMPUTE_THREADS
environment variable to an integer value. This feature is intentionally undocumented and should be considered "experimental".Also rename the pre-existing (also undocumented)
$APOLLO_ROUTER_NUM_CORES
to$APOLLO_ROUTER_IO_THREADS
to better reflect what it does.Drive-by unrelated change: make
cargo run
default to the main router executable.Manual testing instructions: in a terminal, run:
After seeing a log line with
GraphQL endpoint exposed at http://127.0.0.1:4000/
, in another terminal, run:(This step is needed because the thread pool is created lazily on first use.) The first terminal should now have a log line with
Compute thread pool size: $APOLLO_ROUTER_COMPUTE_THREADS = 4
.Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
$APOLLO_ROUTER_NUM_CORES
no longer worksExceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩