Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog exporter resource behavior and mapping #5386

Merged
merged 30 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
e760c2e
add mapping for the router span
bnjjj Jun 10, 2024
27f1d96
Merge branch 'dev' of github.com:apollographql/router into bnjjj/fix_…
bnjjj Jun 25, 2024
1d43fd5
changelog
bnjjj Jun 25, 2024
e343f6b
Tmp
Jun 27, 2024
98de286
Add datadog resource config
Jun 27, 2024
ce332ef
lint
Jun 27, 2024
a45966b
Update json schema for yaml
Jun 28, 2024
869dc03
Enable tests for telemetry on all platforms.
Jun 28, 2024
5ab2310
Lint
Jun 28, 2024
b7e76a4
Switch to jaeger collector so that things work on osx
Jun 28, 2024
cd4c8af
Make sure to explicitly test overriding
Jun 28, 2024
17c7680
Update ddagent test docker image
Jun 28, 2024
b59a89d
Fix and add test for decimal trace id.
Jun 28, 2024
fd68371
Disable datadog tests for non-graphos
Jun 28, 2024
7045ad5
Make it so that original span name is always available.
Jun 28, 2024
f04cefa
Make is so that the user doesn't have to fix the span names for resou…
Jun 28, 2024
00539c6
`resource_mappings` -> `resource_mapping`
Jun 28, 2024
f8f92f5
rustdoc missing subgraph name
Jul 1, 2024
a2d5233
rustdoc missing query_planning mapping
Jul 1, 2024
f943b8f
Add option for fixed span names
Jul 1, 2024
8eb2639
Reduce visibility of fields
Jul 1, 2024
3ee148c
Use serde_derive_default
Jul 1, 2024
ddef1a5
Add docs, make enable_span_mapping and fixed_span_names the default f…
Jul 1, 2024
082a7d0
Update integration tests for new defaults.
Jul 1, 2024
704da6d
Update apollo-router/src/plugins/telemetry/tracing/datadog.rs
BrynCooke Jul 2, 2024
7d752fe
Add env unified tag
Jul 2, 2024
a34d972
Add changelog
Jul 2, 2024
fbda2bf
Factor out a bunch of consts, these were duplicated across the codebase
Jul 2, 2024
2adecb5
Added tests for and fixed override of span names
Jul 2, 2024
3c40231
Merge dev
Jul 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changesets/fix_bnjjj_fix_5282.md
BrynCooke marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### Add mapping for the router span on Datadog ([Issue #5282](https://github.com/apollographql/router/issues/5282))

Add a new span mapping for datadog for the router span as in `spec_compliant` mode we don't provide the `request` span anymore.
After this change the `router` span name will be mapped to `http.router` attribute name.

By [@bnjjj](https://github.com/bnjjj) in https://github.com/apollographql/router/pull/5386
1 change: 1 addition & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ executors:
- image: cimg/redis:7.2.4
- image: jaegertracing/all-in-one:1.54.0
- image: openzipkin/zipkin:2.23.2
- image: ghcr.io/datadog/dd-apm-test-agent/ddapm-test-agent:v1.17.0
resource_class: xlarge
environment:
CARGO_BUILD_JOBS: 4
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1430,6 +1430,14 @@ expression: "&schema"
"endpoint": {
"$ref": "#/definitions/UriEndpoint",
"description": "#/definitions/UriEndpoint"
},
"resource_mappings": {
"additionalProperties": {
"type": "string"
},
"default": {},
"description": "Custom mapping to be used as the resource field in spans, defaults to: router -> http.route supergraph -> graphql.operation.name subgraph_request -> subgraph.name http_request -> http.route",
"type": "object"
}
},
"required": [
Expand Down Expand Up @@ -6795,6 +6803,13 @@ expression: "&schema"
"decimal"
],
"type": "string"
},
{
"description": "Datadog",
"enum": [
"datadog"
],
"type": "string"
}
]
},
Expand Down
3 changes: 3 additions & 0 deletions apollo-router/src/plugins/telemetry/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,9 @@ pub(crate) enum TraceIdFormat {
///
/// (e.g. Trace ID 16 -> 16)
Decimal,

/// Datadog
Datadog,
}

/// Apollo usage report signature normalization algorithm
Expand Down
13 changes: 8 additions & 5 deletions apollo-router/src/plugins/telemetry/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ use opentelemetry::trace::TraceState;
use opentelemetry::trace::TracerProvider;
use opentelemetry::Key;
use opentelemetry::KeyValue;
use opentelemetry_api::trace::TraceId;
use opentelemetry_semantic_conventions::trace::HTTP_REQUEST_METHOD;
use parking_lot::Mutex;
use rand::Rng;
Expand Down Expand Up @@ -100,6 +101,8 @@ use crate::plugins::telemetry::config::TracingCommon;
use crate::plugins::telemetry::config_new::cost::add_cost_attributes;
use crate::plugins::telemetry::config_new::graphql::GraphQLInstruments;
use crate::plugins::telemetry::config_new::instruments::SupergraphInstruments;
use crate::plugins::telemetry::config_new::trace_id;
use crate::plugins::telemetry::config_new::DatadogId;
use crate::plugins::telemetry::dynamic_attribute::SpanDynAttribute;
use crate::plugins::telemetry::fmt_layer::create_fmt_layer;
use crate::plugins::telemetry::metrics::apollo::histogram::ListLengthHistogram;
Expand Down Expand Up @@ -134,7 +137,6 @@ use crate::services::SubgraphResponse;
use crate::services::SupergraphRequest;
use crate::services::SupergraphResponse;
use crate::spec::operation_limits::OperationLimits;
use crate::tracer::TraceId;
use crate::Context;
use crate::ListenAddr;

Expand Down Expand Up @@ -556,17 +558,18 @@ impl Plugin for Telemetry {
});

// Append the trace ID with the right format, based on the config
let format_id = |trace: TraceId| {
let format_id = |trace_id: TraceId| {
let id = match config.exporters.tracing.response_trace_id.format {
TraceIdFormat::Hexadecimal => format!("{:032x}", trace.to_u128()),
TraceIdFormat::Decimal => format!("{}", trace.to_u128()),
TraceIdFormat::Hexadecimal => format!("{:032x}", trace_id),
TraceIdFormat::Decimal => format!("{}", trace_id),
TraceIdFormat::Datadog => trace_id.to_datadog()
};

HeaderValue::from_str(&id).ok()
};
if let (Some(header_name), Some(trace_id)) = (
expose_trace_id_header,
TraceId::current().and_then(format_id),
trace_id().and_then(format_id),
bnjjj marked this conversation as resolved.
Show resolved Hide resolved
) {
resp.response.headers_mut().append(header_name, trace_id);
}
Expand Down
2 changes: 1 addition & 1 deletion apollo-router/src/plugins/telemetry/span_factory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ impl SpanMode {
}
SpanMode::SpecCompliant => {
info_span!(ROUTER_SPAN_NAME,
// Needed for apollo_telemetry
// Needed for apollo_telemetry and datadog span mapping
"http.route" = %request.uri(),
"http.request.method" = %request.method(),
"otel.name" = ::tracing::field::Empty,
Expand Down
76 changes: 47 additions & 29 deletions apollo-router/src/plugins/telemetry/tracing/datadog.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
use std::collections::HashMap;

use http::Uri;
use lazy_static::lazy_static;
use opentelemetry::sdk;
use opentelemetry::sdk::trace::BatchSpanProcessor;
use opentelemetry::sdk::trace::Builder;
Expand All @@ -23,19 +22,22 @@ use crate::plugins::telemetry::tracing::BatchProcessorConfig;
use crate::plugins::telemetry::tracing::SpanProcessorExt;
use crate::plugins::telemetry::tracing::TracingConfigurator;

lazy_static! {
static ref SPAN_RESOURCE_NAME_ATTRIBUTE_MAPPING: HashMap<&'static str, &'static str> = {
let mut map = HashMap::new();
map.insert("request", "http.route");
map.insert("supergraph", "graphql.operation.name");
map.insert("query_planning", "graphql.operation.name");
map.insert("subgraph", "subgraph.name");
map.insert("subgraph_request", "graphql.operation.name");
map
};
static ref DEFAULT_ENDPOINT: Uri = Uri::from_static("http://127.0.0.1:8126");
fn default_resource_mappings() -> HashMap<String, String> {
let mut map = HashMap::new();
BrynCooke marked this conversation as resolved.
Show resolved Hide resolved
map.insert("request", "http.route");
BrynCooke marked this conversation as resolved.
Show resolved Hide resolved
map.insert("router", "http.route");
map.insert("supergraph", "graphql.operation.name");
map.insert("query_planning", "graphql.operation.name");
map.insert("subgraph", "subgraph.name");
map.insert("subgraph_request", "graphql.operation.name");
map.insert("http_request", "http.route");
map.iter()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect()
}

const DEFAULT_ENDPOINT: &str = "http://127.0.0.1:8126";

#[derive(Debug, Clone, Deserialize, JsonSchema, Default)]
#[serde(deny_unknown_fields)]
pub(crate) struct Config {
Expand All @@ -53,6 +55,14 @@ pub(crate) struct Config {
/// Enable datadog span mapping for span name and resource name.
#[serde(default)]
pub(crate) enable_span_mapping: bool,

/// Custom mapping to be used as the resource field in spans, defaults to:
/// router -> http.route
/// supergraph -> graphql.operation.name
/// subgraph_request -> subgraph.name
/// http_request -> http.route
#[serde(default)]
resource_mappings: HashMap<String, String>,
}

impl TracingConfigurator for Config {
Expand All @@ -67,25 +77,33 @@ impl TracingConfigurator for Config {
_spans_config: &Spans,
) -> Result<Builder, BoxError> {
tracing::info!("Configuring Datadog tracing: {}", self.batch_processor);
let enable_span_mapping = self.enable_span_mapping.then_some(true);
let common: sdk::trace::Config = trace.into();

// Precompute representation otel Keys for the mappings so that we don't do heap allocation for each span
let resource_mappings = self.enable_span_mapping.then(|| {
let mut resource_mappings = default_resource_mappings();
resource_mappings.extend(self.resource_mappings.clone());
resource_mappings
.iter()
.map(|(k, v)| (k.clone(), opentelemetry::Key::from(v.clone())))
.collect::<HashMap<String, Key>>()
});

let exporter = opentelemetry_datadog::new_pipeline()
.with(&self.endpoint.to_uri(&DEFAULT_ENDPOINT), |builder, e| {
builder.with_agent_endpoint(e.to_string().trim_end_matches('/'))
})
.with(&enable_span_mapping, |builder, _e| {
builder
.with_name_mapping(|span, _model_config| span.name.as_ref())
.with_resource_mapping(|span, _model_config| {
SPAN_RESOURCE_NAME_ATTRIBUTE_MAPPING
.get(span.name.as_ref())
.and_then(|key| span.attributes.get(&Key::from_static_str(key)))
.and_then(|value| match value {
Value::String(value) => Some(value.as_str()),
_ => None,
})
.unwrap_or(span.name.as_ref())
})
.with(
&self.endpoint.to_uri(&Uri::from_static(DEFAULT_ENDPOINT)),
|builder, e| builder.with_agent_endpoint(e.to_string().trim_end_matches('/')),
)
.with(&resource_mappings, |builder, resource_mappings| {
let resource_mappings = resource_mappings.clone();
builder.with_resource_mapping(move |span, _model_config| {
if let Some(mapping) = resource_mappings.get(span.name.as_ref()) {
if let Some(Value::String(value)) = span.attributes.get(mapping) {
return value.as_str();
}
}
return span.name.as_ref();
})
})
.with(
&common.resource.get(SERVICE_NAME),
Expand Down
37 changes: 20 additions & 17 deletions apollo-router/tests/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ use opentelemetry::sdk::trace::TracerProvider;
use opentelemetry::sdk::Resource;
use opentelemetry::testing::trace::NoopSpanExporter;
use opentelemetry::trace::TraceContextExt;
use opentelemetry_api::trace::TraceId;
use opentelemetry_api::trace::TracerProvider as OtherTracerProvider;
use opentelemetry_api::Context;
use opentelemetry_api::KeyValue;
Expand Down Expand Up @@ -81,7 +82,6 @@ pub struct IntegrationTest {
_subgraphs: wiremock::MockServer,
telemetry: Telemetry,

// Don't remove these, there is a weak reference to the tracer provider from a tracer and if the provider is dropped then no export will happen.
pub _tracer_provider_client: TracerProvider,
pub _tracer_provider_subgraph: TracerProvider,
subscriber_client: Dispatch,
Expand Down Expand Up @@ -175,6 +175,7 @@ impl Telemetry {
.with_span_processor(
BatchSpanProcessor::builder(
opentelemetry_datadog::new_pipeline()
.with_service_name(service_name)
.build_exporter()
.expect("datadog pipeline failed"),
opentelemetry::runtime::Tokio,
Expand Down Expand Up @@ -488,7 +489,7 @@ impl IntegrationTest {
#[allow(dead_code)]
pub fn execute_default_query(
&self,
) -> impl std::future::Future<Output = (String, reqwest::Response)> {
) -> impl std::future::Future<Output = (TraceId, reqwest::Response)> {
self.execute_query_internal(
&json!({"query":"query {topProducts{name}}","variables":{}}),
None,
Expand All @@ -499,36 +500,36 @@ impl IntegrationTest {
pub fn execute_query(
&self,
query: &Value,
) -> impl std::future::Future<Output = (String, reqwest::Response)> {
) -> impl std::future::Future<Output = (TraceId, reqwest::Response)> {
self.execute_query_internal(query, None)
}

#[allow(dead_code)]
pub fn execute_bad_query(
&self,
) -> impl std::future::Future<Output = (String, reqwest::Response)> {
) -> impl std::future::Future<Output = (TraceId, reqwest::Response)> {
self.execute_query_internal(&json!({"garbage":{}}), None)
}

#[allow(dead_code)]
pub fn execute_huge_query(
&self,
) -> impl std::future::Future<Output = (String, reqwest::Response)> {
) -> impl std::future::Future<Output = (TraceId, reqwest::Response)> {
self.execute_query_internal(&json!({"query":"query {topProducts{name, name, name, name, name, name, name, name, name, name}}","variables":{}}), None)
}

#[allow(dead_code)]
pub fn execute_bad_content_type(
&self,
) -> impl std::future::Future<Output = (String, reqwest::Response)> {
) -> impl std::future::Future<Output = (TraceId, reqwest::Response)> {
self.execute_query_internal(&json!({"garbage":{}}), Some("garbage"))
}

fn execute_query_internal(
&self,
query: &Value,
content_type: Option<&'static str>,
) -> impl std::future::Future<Output = (String, reqwest::Response)> {
) -> impl std::future::Future<Output = (TraceId, reqwest::Response)> {
assert!(
self.router.is_some(),
"router was not started, call `router.start().await; router.assert_started().await`"
Expand All @@ -540,7 +541,7 @@ impl IntegrationTest {

async move {
let span = info_span!("client_request");
let span_id = span.context().span().span_context().trace_id().to_string();
let span_id = span.context().span().span_context().trace_id();

async move {
let client = reqwest::Client::new();
Expand Down Expand Up @@ -577,7 +578,7 @@ impl IntegrationTest {
pub fn execute_untraced_query(
&self,
query: &Value,
) -> impl std::future::Future<Output = (String, reqwest::Response)> {
) -> impl std::future::Future<Output = (TraceId, reqwest::Response)> {
assert!(
self.router.is_some(),
"router was not started, call `router.start().await; router.assert_started().await`"
Expand All @@ -600,14 +601,16 @@ impl IntegrationTest {
request.headers_mut().remove(ACCEPT);
match client.execute(request).await {
Ok(response) => (
response
.headers()
.get("apollo-custom-trace-id")
.cloned()
.unwrap_or(HeaderValue::from_static("no-trace-id"))
.to_str()
.unwrap_or_default()
.to_string(),
TraceId::from_hex(
response
.headers()
.get("apollo-custom-trace-id")
.cloned()
.unwrap_or(HeaderValue::from_static("no-trace-id"))
.to_str()
.unwrap_or_default(),
)
.unwrap_or(TraceId::INVALID),
response,
),
Err(err) => {
Expand Down
Loading
Loading