Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traffic shaping for connectors #6737

Merged
merged 32 commits into from
Mar 10, 2025
Merged

Conversation

andrewmcgivery
Copy link
Contributor

@andrewmcgivery andrewmcgivery commented Feb 6, 2025

Example config:

traffic_shaping:
  connector:
    sources:
      connector-graph.random_person_api:
        global_rate_limit:
          capacity: 20
          interval: 1s
        experimental_http2: http2only
        timeout: 1s

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Feb 6, 2025

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: 872f6b6d11aa1f1fd4e68499

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented Feb 6, 2025

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@andrewmcgivery andrewmcgivery marked this pull request as ready for review February 28, 2025 00:58
@andrewmcgivery andrewmcgivery requested review from a team as code owners February 28, 2025 00:58
"description": "#/definitions/Compression",
"nullable": true
},
"dns_resolution_strategy": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i correct in reading that dns_resolution_strategy and experimental_http2 aren't tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not explicitly write tests for them because the existing traffic shaping also doesn't.

My interpretation as to why is because while these settings are in traffic shaping, the actual logic/functionality for them is used elsewhere in other parts of the codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were there things you had to test this way (using this new mock service and the tests in plugins/traffic_shaping/mod.rs) that warrant two ways of testing? i'm wondering if we can kill this in favor or tests/integration/traffic_shaping.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm be fine with doing just the integration tests.

I did both ways mostly to match what was already being done in the existing tests.... but I'm not married to it. I tend to monkey-see-monkey-do when touching existing code. :D

@@ -231,6 +231,16 @@ impl Connector {
.map_err(|_| internal_error!("Failed to create key for connector {}", self.id.label)),
}
}

/// Create an identifier for this connector that can be used for configuration and service identification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document the behavior when source is None and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more comment!

@@ -1608,6 +1607,14 @@ snapshot_kind: text
"description": "#/definitions/RouterShaping",
"nullable": true
},
"sources": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in a doc comment but you might have missed it - we need an additional connector: layer here, above sources.

.sources
.get(&source_name)
.map(|connector_config| connector_config.clone().into());
let final_config = Self::merge_config(all_config, source_config.as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although ConnectorShaping always sets deduplicate_query to None, this merge will cause it to be set if the all config has it set. See here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this get addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! This was a problem because before the all config was the subgraph config which could contain deduplicate_query whereas now the merge will be two connector configs.

@@ -423,6 +424,31 @@ pub(crate) async fn create_http_services(
let http_service_factory = HttpClientServiceFactory::new(http_service, plugins.clone());
http_services.insert(name.clone(), http_service_factory);
}

// Also create client service factories for connector sources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have already redundantly created an HttpClientServiceFactory above for the expanded connector subgraph, and that one will no longer be used by the ConnectorRequestService. We should find a way to not create that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would creating a list of "connector subgraph names" prior to that loop and doing a comparison of the subgraph name versus the connector subgraph name suffice?

Basically... if we have a subgraph name that matches a connector subgraph name... don't add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented what I stated above!

@@ -103,7 +103,7 @@ pub(crate) fn make_request(
transport.body.as_ref().map(|body| SelectionData {
source: body.to_string(),
transformed: body.to_string(), // no transformation so this is the same
result: json_body,
result: json_body.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this clone? It seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got left in from a previous iteration of this PR.. removed!

),
)
.boxed()
pub(crate) fn create(&self, source_name: String) -> BoxService {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CONNECT_REQUEST_SPAN_NAME span used to be created here - where does that happen now? I'm not finding it. It should likely move to the telemetry plugin connect_request_service method via span_factory like it is for subgraph service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crap, definitely accidentally nuked it. I'll look at adding it back!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it back as suggested!

@@ -49,7 +49,7 @@ pub(crate) enum HandleResponseError {

// --- RAW RESPONSE ------------------------------------------------------------

enum RawResponse {
pub(crate) enum RawResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pub(crate)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got left in from a previous iteration of this PR.. removed!

@andrewmcgivery andrewmcgivery requested review from a team as code owners March 6, 2025 22:23
@@ -940,6 +953,17 @@ impl PluggableSupergraphServiceBuilder {
.and_then(|plugin| (*plugin.1).as_any().downcast_ref::<Subscription>())
.map(|p| p.config.clone());

let connector_sources: HashSet<String> = schema
.connectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just make a getter on the Connectors struct for this? no need to compute it more than once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this is to reduce some repetition since this exact code block is done in a couple of places?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, both the set of subgraphs and set of source config keys can be calculated once at schema load time.

@@ -203,7 +203,7 @@ impl RawResponse {
}

// --- MAPPED RESPONSE ---------------------------------------------------------
#[derive(Debug)]
#[derive(Debug, Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me why this is clone? maybe even add a comment for future readers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have accidentally left it in from a previous iteration of code 😓 removed!

@andrewmcgivery andrewmcgivery merged commit 7b79bea into dev Mar 10, 2025
15 checks passed
@andrewmcgivery andrewmcgivery deleted the feature/connectortrafficshaping branch March 10, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants