Add batchprocessor support for client metadata #7325

jmacd · 2023-03-06T19:50:05Z

Description: Add support for batching by metadata keys. We are aware that this has been frequently requested.

This work was prioritized on our team because it is a prerequisite for the OTel-Arrow compression bridge, see open-telemetry/community#1332 and open-telemetry/oteps#171

Link to tracking Issue: #4544

Testing: One new test was added.

Documentation: The README was updated.

codecov · 2023-03-06T19:55:08Z

Codecov Report

Patch coverage: 88.05% and project coverage change: -0.03 ⚠️

Comparison is base (5a55ff8) 91.10% compared to head (9ab2157) 91.07%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7325      +/-   ##
==========================================
- Coverage   91.10%   91.07%   -0.03%     
==========================================
  Files         295      295              
  Lines       14373    14475     +102     
==========================================
+ Hits        13094    13183      +89     
- Misses       1011     1021      +10     
- Partials      268      271       +3

Impacted Files	Coverage Δ
processor/batchprocessor/metrics.go	`85.21% <80.00%> (-1.05%)`	⬇️
processor/batchprocessor/batch_processor.go	`89.70% <88.05%> (-0.78%)`	⬇️
processor/batchprocessor/config.go	`100.00% <100.00%> (ø)`
processor/batchprocessor/factory.go	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

bogdandrutu

Question: Do you need to batch by metadata or you need the batch to preserve the metadata? Are both needed or just the preservation?

processor/batchprocessor/batch_processor.go

jmacd · 2023-03-13T16:58:25Z

Question: Do you need to batch by metadata or you need the batch to preserve the metadata? Are both needed or just the preservation?

I need both for the OTel-Arrow compression bridge, see open-telemetry/community#1332 and open-telemetry/oteps#171 because I want the use of a bridge to be transparent with regards to selected metadata.

jmacd · 2023-03-13T17:09:38Z

@bogdandrutu would you like a boolean setting to control whether the batchprocessor's output context sets the metadata values? I don't have any applications in mind for batching without propagating the metadata.

processor/batchprocessor/batch_processor.go

ericmustin

Broadly this looks good to me, I had a question for my own edification around the behavior when the key isn't present on Metadata.

It's also worth noting that Metadata itself is marked as Experimental( see

opentelemetry-collector/client/client.go

Lines 111 to 113 in 0a89f57

    
           // Metadata is the request metadata from the client connecting to this connector. 
        
           // Experimental: *NOTE* this structure is subject to change or removal in the future. 
        
           Metadata Metadata

), would it be appropriate to also mark this configuration option as Experimental as a result? Or, should Metadata have it's Experimental distinction removed?

processor/batchprocessor/config.go

processor/batchprocessor/batch_processor.go

codeboten

Just a couple of non blocking comments on my end, PTAL

processor/batchprocessor/README.md

processor/batchprocessor/factory.go

codeboten · 2023-04-03T16:27:13Z

@bogdandrutu @dmitryax @astencel-sumo @ericmustin please review and approve if your comments have been addressed

ericmustin

lgtm

processor/batchprocessor/README.md

processor/batchprocessor/batch_processor.go

jpkrohling · 2023-04-05T12:58:47Z

processor/batchprocessor/batch_processor.go

+}
+
+func (bp *batchProcessor) currentMetadataCardinality() int {
+	bp.lock.Lock()


How about a read lock here?

This lock will be contended by writers on lines 294-295 and will only be read occasionally, so I don't think that would be an improvement. I've commented on how the need to maintain a map of attribute-sets while avoiding lock contention, allocations and also supporting recycling is a tricky problem, one faced by a metrics SDK.

processor/batchprocessor/batch_processor.go

processor/batchprocessor/batch_processor_test.go

processor/batchprocessor/config.go

jpkrohling · 2023-04-05T13:24:41Z

processor/batchprocessor/factory.go

-	defaultTimeout       = 200 * time.Millisecond
+	defaultSendBatchSize            = uint32(8192)
+	defaultTimeout                  = 200 * time.Millisecond
+	defaultMetadataCardinalityLimit = 100


I can see users asking already for guidance on setting this value. Is there a reason why this is 100 and not 1000? I have a feeling that 100 is too low, but I don't know the impact of having 1000 as the default. Have you played with the settings, and can you write a couple of sentences on picking the right value?

I am not sure what kind of user will think 100 is too low. In the scenario that Lightstep cares about, the number of metadata combinations is small, it's the number of projects per customer. The customer might have multiple projects in use, but they'll know how many that is. I wrote:

// defaultMetadataCardinalityLimit should be set to the number // of metadata configurations the user expects to submit to // the collector.

My view needs to be adjusted to the reality of common users of this, but when I think about metadata combinations, I think of tenant-id and cluster-name, for instance. In that case, 5 clusters and 20 tenants would already be 100 potential combinations. Going back to what I had originally in mind: is there guidance we can provide on the costs of 1000 vs. 100? Is it mostly about memory? There might also be a processing/scheduling penalty, but I believe this will be mostly memory, right?

jmacd · 2023-04-05T15:37:20Z

I will process all the feedback above, today. Thanks reviewers!

jmacd · 2023-04-05T18:04:56Z

@jpkrohling I want to draw a connection between your remarks:

@bogdandrutu asked me to try not to use the OTel-Go attribute.Set, so I replaced it quickly with inferior code, which you noticed in several ways. I put attribute.Set back.. I filed Core component to compute attribute sets needed #7455 to describe how we really ought to have a core component for computing attribute sets directly from pdata and how without that, it's tempting to use inferior code. Note that since pcommon removed the ability to (pcommon.Map).Sort() it's really not a trivial task to compute a distinct attribute set from pdata.
The reason I chose to limit the number of lifetime batchers is that it avoids a complex synchronization problem, one that could be taken up in a separate PR. Note that it's exactly the same problem facing a metrics SDK in general, one I've worked several ways. There's a connected discussion in OTel-Go about how many memory allocations are required to (safely) perform an aggregate-by-attribute-set step in a metrics pipeline, and the use of attribute.Set found in this is PR not ideal because it's at least two allocations per data item that passes through, that's one to form a slice of attributes (for sorting), and one to form a distinct map key (an array of attributes, in the case of attribute.Set). As discussed in that thread, we can optimize this to get to one allocation (or zero allocations using a sync.Pool of temporary attribute slices), but it comes with substantial complexity, especially so if the implementation also must support eliminating stale attribute sets. https://github.com/lightstep/otel-launcher-go/blob/main/lightstep/sdk/metric/README.md demonstrates how this can be done, for the record.

Here's why I think it's OK to go with a simple approach for now and defer complex discussions about how expensive the process is and how quickly it can recycle memory when batchers go idle. I expect this batching process to be performed on the customer side of an OTel collector pipeline. The customer will be responsible for setting this number to at most the number of distinct metadata keys that they are using themselves, for it is they who benefit from batching before sending data across an expensive network connection. In this case, the customer will have as many distinct metadata sets as they have teams, or possibly environments*teams. This should be a fixed number for the customer.

This is not a feature meant to help a vendor install a batching process in their own pipeline, where the number of distinct attribute sets might be determined by the number of customers or expected to be large and variable.

jpkrohling

LGTM, perhaps a specific readme doc can be improved, but this is not blocking.

jpkrohling · 2023-04-10T14:26:44Z

processor/batchprocessor/factory.go

-	defaultTimeout       = 200 * time.Millisecond
+	defaultSendBatchSize            = uint32(8192)
+	defaultTimeout                  = 200 * time.Millisecond
+	defaultMetadataCardinalityLimit = 100


My view needs to be adjusted to the reality of common users of this, but when I think about metadata combinations, I think of tenant-id and cluster-name, for instance. In that case, 5 clusters and 20 tenants would already be 100 potential combinations. Going back to what I had originally in mind: is there guidance we can provide on the costs of 1000 vs. 100? Is it mostly about memory? There might also be a processing/scheduling penalty, but I believe this will be mostly memory, right?

github-actions · 2023-04-25T03:15:31Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

jmacd · 2023-04-26T19:14:30Z

I have re-opened this PR #7578. I had accidentally force-pushed the branch and it's easier now to create a new PR, sorry. Identical code as this pr had through 9ab2157, followed by merge with main, conflict resolution, and then e7a7919

jmacd requested review from a team and jpkrohling March 6, 2023 19:50

bogdandrutu reviewed Mar 11, 2023

View reviewed changes

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

bogdandrutu reviewed Mar 11, 2023

View reviewed changes

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

ericmustin reviewed Mar 22, 2023

View reviewed changes

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

ericmustin reviewed Mar 22, 2023

View reviewed changes

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

ericmustin reviewed Mar 22, 2023

View reviewed changes

andrzej-stencel reviewed Mar 23, 2023

View reviewed changes

processor/batchprocessor/config.go Outdated Show resolved Hide resolved

andrzej-stencel reviewed Mar 23, 2023

View reviewed changes

processor/batchprocessor/config.go Outdated Show resolved Hide resolved

codeboten reviewed Mar 24, 2023

View reviewed changes

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

dmitryax reviewed Mar 28, 2023

View reviewed changes

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

processor/batchprocessor/batch_processor.go Outdated Show resolved Hide resolved

codeboten approved these changes Mar 29, 2023

View reviewed changes

processor/batchprocessor/README.md Outdated Show resolved Hide resolved

processor/batchprocessor/factory.go Outdated Show resolved Hide resolved

ericmustin approved these changes Apr 3, 2023

View reviewed changes

andrzej-stencel reviewed Apr 5, 2023

View reviewed changes

processor/batchprocessor/README.md Outdated Show resolved Hide resolved

andrzej-stencel approved these changes Apr 5, 2023

View reviewed changes

jpkrohling reviewed Apr 5, 2023

View reviewed changes

jpkrohling approved these changes Apr 10, 2023

View reviewed changes

github-actions bot added the Stale label Apr 25, 2023

jmacd removed the Stale label Apr 26, 2023

jmacd closed this Apr 26, 2023

jmacd force-pushed the main branch from 9ab2157 to 9257d7f Compare April 26, 2023 18:49

jmacd mentioned this pull request Apr 26, 2023

Add batchprocessor support for client metadata #7578

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batchprocessor support for client metadata #7325

Add batchprocessor support for client metadata #7325

jmacd commented Mar 6, 2023

codecov bot commented Mar 6, 2023 •

edited

Loading

bogdandrutu left a comment

jmacd commented Mar 13, 2023

jmacd commented Mar 13, 2023

ericmustin left a comment

codeboten left a comment

codeboten commented Apr 3, 2023

ericmustin left a comment

jpkrohling Apr 5, 2023

jmacd Apr 5, 2023

jpkrohling Apr 5, 2023

jmacd Apr 5, 2023

jpkrohling Apr 10, 2023

jmacd commented Apr 5, 2023

jmacd commented Apr 5, 2023

jpkrohling left a comment

jpkrohling Apr 10, 2023

github-actions bot commented Apr 25, 2023

jmacd commented Apr 26, 2023

	// Metadata is the request metadata from the client connecting to this connector.
	// Experimental: NOTE this structure is subject to change or removal in the future.
	Metadata Metadata

Add batchprocessor support for client metadata #7325

Add batchprocessor support for client metadata #7325

Conversation

jmacd commented Mar 6, 2023

codecov bot commented Mar 6, 2023 • edited Loading

Codecov Report

bogdandrutu left a comment

Choose a reason for hiding this comment

jmacd commented Mar 13, 2023

jmacd commented Mar 13, 2023

ericmustin left a comment

Choose a reason for hiding this comment

codeboten left a comment

Choose a reason for hiding this comment

codeboten commented Apr 3, 2023

ericmustin left a comment

Choose a reason for hiding this comment

jpkrohling Apr 5, 2023

Choose a reason for hiding this comment

jmacd Apr 5, 2023

Choose a reason for hiding this comment

jpkrohling Apr 5, 2023

Choose a reason for hiding this comment

jmacd Apr 5, 2023

Choose a reason for hiding this comment

jpkrohling Apr 10, 2023

Choose a reason for hiding this comment

jmacd commented Apr 5, 2023

jmacd commented Apr 5, 2023

jpkrohling left a comment

Choose a reason for hiding this comment

jpkrohling Apr 10, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 25, 2023

jmacd commented Apr 26, 2023

codecov bot commented Mar 6, 2023 •

edited

Loading