-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OTE-1506] Loadbalancer exporter: Add resource_keys routing for the traces #14208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Really neat way to do it, and it seems refactoring has already addressed a few issues we previously had e.g. batching of traces.
Would note here possible large cardinality in metrics
if err == nil {
_ = stats.RecordWithTags(
ctx,
[]tag.Mutator{tag.Upsert(endpointTagKey, endpoints[exp]), successTrueMutator},
mBackendLatency.M(duration.Milliseconds()))
} else {
_ = stats.RecordWithTags(
ctx,
[]tag.Mutator{tag.Upsert(endpointTagKey, endpoints[exp]), successFalseMutator},
mBackendLatency.M(duration.Milliseconds()))
}
Since this creates a metric for each endpoint, on each pod, but I think we already do this anyway (so might be worth checking existing cardinality and if it's significant).
It shouldn't be considered as a high cardinality, I guess, cause Datadog calculates cardinality on moving 1h windows, not absolute cardinality. So, worst case, it will be like |
} | ||
return ids, nil | ||
if !missingResourceKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace missingResourceKey
and resourceKeyFound
this with len(ids) > 0
? We should return ids map if it is not empty, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an assumption that there should be an identifier per resource in this function (Although in reality, we will have only a single resource per call - something Strange about the upstream implementation that should be addressed, since I want to merge it to upstream, I don't want to address this assumption!)0
@@ -149,15 +155,25 @@ func routingIdentifiersFromTraces(td ptrace.Traces, key routingKey) (map[string] | |||
return nil, errors.New("empty spans") | |||
} | |||
|
|||
if key == svcRouting { | |||
if e.routingKey == resourceKeysRouting { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@foadnh does this change mean that we no longer have routing by service? It looks like you've switched from service routing to resource key routing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By service is now a particular case of by resource keys: https://github.com/Canva/opentelemetry-collector-contrib/pull/14208/files#diff-6c6c610a2bedfe21f69e2d58c1f06296bcb5bcaca44d9338f89b698737ec648dR55-R56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now that I understand why it's been implemented the way it has :)
Ticket
Context
Resource-based load balancing is required for Canva reliability metrics.
Currently, we maintain a copy of the load balancer in the otel-platform repo. This copy is based on the old version (
v0.79.0
) and never updated with the upstream.There are improvements and new features in the upstream version that potentially can resolve our public gateway migration issues. See this thread.
In this PR, we add resource_keys based load balancing to the opentelemetry-collector-contrib fork.
Notes
routing_key: resource
) in this PR, which conflicts with the new versions' features.v0.102.0
and not the upstream master (Preparing forv0.112.0
). We need to rebase with the upstream master to prevent conflicts.