-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batchprocessor support for client metadata #7325
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #7325 +/- ##
==========================================
- Coverage 91.10% 91.07% -0.03%
==========================================
Files 295 295
Lines 14373 14475 +102
==========================================
+ Hits 13094 13183 +89
- Misses 1011 1021 +10
- Partials 268 271 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do you need to batch by metadata or you need the batch to preserve the metadata? Are both needed or just the preservation?
I need both for the OTel-Arrow compression bridge, see open-telemetry/community#1332 and open-telemetry/oteps#171 because I want the use of a bridge to be transparent with regards to selected metadata. |
@bogdandrutu would you like a boolean setting to control whether the batchprocessor's output context sets the metadata values? I don't have any applications in mind for batching without propagating the metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly this looks good to me, I had a question for my own edification around the behavior when the key isn't present on Metadata
.
It's also worth noting that Metadata
itself is marked as Experimental
( see
opentelemetry-collector/client/client.go
Lines 111 to 113 in 0a89f57
// Metadata is the request metadata from the client connecting to this connector. | |
// Experimental: *NOTE* this structure is subject to change or removal in the future. | |
Metadata Metadata |
Experimental
as a result? Or, should Metadata
have it's Experimental
distinction removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of non blocking comments on my end, PTAL
@bogdandrutu @dmitryax @astencel-sumo @ericmustin please review and approve if your comments have been addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
} | ||
|
||
func (bp *batchProcessor) currentMetadataCardinality() int { | ||
bp.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a read lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock will be contended by writers on lines 294-295 and will only be read occasionally, so I don't think that would be an improvement. I've commented on how the need to maintain a map of attribute-sets while avoiding lock contention, allocations and also supporting recycling is a tricky problem, one faced by a metrics SDK.
processor/batchprocessor/factory.go
Outdated
defaultTimeout = 200 * time.Millisecond | ||
defaultSendBatchSize = uint32(8192) | ||
defaultTimeout = 200 * time.Millisecond | ||
defaultMetadataCardinalityLimit = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see users asking already for guidance on setting this value. Is there a reason why this is 100 and not 1000? I have a feeling that 100 is too low, but I don't know the impact of having 1000 as the default. Have you played with the settings, and can you write a couple of sentences on picking the right value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what kind of user will think 100 is too low. In the scenario that Lightstep cares about, the number of metadata combinations is small, it's the number of projects per customer. The customer might have multiple projects in use, but they'll know how many that is. I wrote:
// defaultMetadataCardinalityLimit should be set to the number
// of metadata configurations the user expects to submit to
// the collector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view needs to be adjusted to the reality of common users of this, but when I think about metadata combinations, I think of tenant-id and cluster-name, for instance. In that case, 5 clusters and 20 tenants would already be 100 potential combinations. Going back to what I had originally in mind: is there guidance we can provide on the costs of 1000 vs. 100? Is it mostly about memory? There might also be a processing/scheduling penalty, but I believe this will be mostly memory, right?
I will process all the feedback above, today. Thanks reviewers! |
@jpkrohling I want to draw a connection between your remarks:
Here's why I think it's OK to go with a simple approach for now and defer complex discussions about how expensive the process is and how quickly it can recycle memory when batchers go idle. I expect this batching process to be performed on the customer side of an OTel collector pipeline. The customer will be responsible for setting this number to at most the number of distinct metadata keys that they are using themselves, for it is they who benefit from batching before sending data across an expensive network connection. In this case, the customer will have as many distinct metadata sets as they have teams, or possibly environments*teams. This should be a fixed number for the customer. This is not a feature meant to help a vendor install a batching process in their own pipeline, where the number of distinct attribute sets might be determined by the number of customers or expected to be large and variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, perhaps a specific readme doc can be improved, but this is not blocking.
processor/batchprocessor/factory.go
Outdated
defaultTimeout = 200 * time.Millisecond | ||
defaultSendBatchSize = uint32(8192) | ||
defaultTimeout = 200 * time.Millisecond | ||
defaultMetadataCardinalityLimit = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view needs to be adjusted to the reality of common users of this, but when I think about metadata combinations, I think of tenant-id and cluster-name, for instance. In that case, 5 clusters and 20 tenants would already be 100 potential combinations. Going back to what I had originally in mind: is there guidance we can provide on the costs of 1000 vs. 100? Is it mostly about memory? There might also be a processing/scheduling penalty, but I believe this will be mostly memory, right?
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Description: Add support for batching by metadata keys. We are aware that this has been frequently requested.
This work was prioritized on our team because it is a prerequisite for the OTel-Arrow compression bridge, see open-telemetry/community#1332 and open-telemetry/oteps#171
Link to tracking Issue: #4544
Testing: One new test was added.
Documentation: The README was updated.