rfc(feature): SDK Spans Aggregator #126

philipphofmann · 2023-11-28T08:01:22Z

This RFC aims to find a strategy for batching spans together to avoid sending one envelope per span.

Rendered RFC

markushi

Nice one, I left a few comments!

text/0126-batch-span-ingestion.md

Co-authored-by: Markus Hintersteiner <markus.hintersteiner@sentry.io>

AbhiPrasad · 2023-12-07T13:35:13Z

One high level thing is if you had the chance to look at https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md?

Instead of span count, does it make sense to use byte size as a weight like we are doing for metrics aggregation? Should this be an implemented on the client or the transport?

philipphofmann · 2023-12-15T08:57:49Z

@AbhiPrasad, thanks for the link to OTel. I should also look for existing concepts in OTel for every feature we build. The Span Buffer is similar to the Batch Processor of OTel. I added the option to set the timeout to 0 to this RFC. Do you think we should align our Span Buffer more with the OTel Batch Processor? If I'm correct, we must send span children in the same envelope as their parents. So that's a requirement the OTel Batch Processor doesn't specify. Therefore, I think having our version of Span Buffer specific to spans is okay. Furthermore, I don't think we have needs for send_batch_max_size, metadata_keys, and metadata_cardinality_limit. Or am I missing something?

Instead of span count, does it make sense to use byte size as a weight like we are doing for metrics aggregation?

How do you count the byte sizes of the objects? Do you serialize them and keep track of the byte size? I think that's a great idea because you don't care about how many spans you keep in memory, but more about how much memory they use.

Should this be an implemented on the client or the transport?

As the Span Buffer must only accept spans after sampling and any other processing that could strip out data and keeping track of the byte size is easier after serializing, I would vote for the transport. If you agree, I'm going to update the RFC.

philipphofmann · 2024-01-03T11:24:58Z

@AbhiPrasad, please have a look at #126 (comment).

AbhiPrasad · 2024-01-10T14:42:37Z

Do you think we should align our Span Buffer more with the OTel Batch Processor? If I'm correct, we must send span children in the same envelope as their parents. So that's a requirement the OTel Batch Processor doesn't specify. Therefore, I think having our version of Span Buffer specific to spans is okay

We don't necessarily have the requirement that span children are sent in the same envelope as their parents, otherwise we run into the same issue with transactions (buffering a bunch of stuff in memory).

I do think having our own specific spec for a span buffer is makes sense though, and that send_batch_max_size, metadata_keys, and metadata_cardinality_limit seem unnecessary for our use case atm.

How do you count the byte sizes of the objects? Do you serialize them and keep track of the byte size? I think that's a great idea because you don't care about how many spans you keep in memory, but more about how much memory they use.

so I was actually wrong about this, we don't keep the exact byte size, but instead an approximate weight based on type of metric/what it contains. We can try a similar heuristic as well, just weight by different properties on the span.

As the Span Buffer must only accept spans after sampling and any other processing that could strip out data and keeping track of the byte size is easier after serializing, I would vote for the transport. If you agree, I'm going to update the RFC.

Transport sounds like the correct decision to me!

philipphofmann · 2024-01-12T08:00:33Z

We don't necessarily have the requirement that span children are sent in the same envelope as their parents,

I thought that was a requirement. @phacops, can you confirm that span children don't need to be sent in the same envelope as their parents?

we don't keep the exact byte size, but instead an approximate weight based on type of metric/what it contains. We can try a similar heuristic as well, just weight by different properties on the span.

Okay, so we could put weights on different properties and sum them. For example, a span with only simple properties has a weight of 10. Then, each tag again adds 1 weight, each data object 5, etc. I see that Python has that https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/metrics.py#L114-L118. I will come up with a spec.

Update 1
I decided to use the serialized dictionary to calculate the weights because we need to serialize the spans. Also, I think the client is better suited for the span buffer, as the transport is only aware of envelopes, and we already have the MetricsAggregator in the client on JS.

I think we could rename the SpanBuffer to SpansAggregator to align with metrics. Do you agree with @AbhiPrasad?

Update 2
As the SDKs need to serialize the spans and don't need to change them in the span buffer, we could also serialize them and count the serialized bytes. That would work on Cocoa, but I'm unsure about other platforms. Would that work on JS @AbhiPrasad?

AbhiPrasad · 2024-01-16T22:03:23Z

I think we could rename the SpanBuffer to SpansAggregator to align with metrics

That matches my mental model, let's do it!

As the SDKs need to serialize the spans and don't need to change them in the span buffer, we could also serialize them and count the serialized bytes

Hmm I think this varies based on platform, not everyone can serialize, maybe we make this requirement flexible depending on sdk? So SDK can decide on weighting methodology?

markushi

Nice improvements, looks good to me!

philipphofmann · 2024-01-17T14:06:08Z

Hmm I think this varies based on platform, not everyone can serialize, maybe we make this requirement flexible depending on sdk? So SDK can decide on weighting methodology?

@AbhiPrasad, this already is in the RFC:

The span aggregator uses a combination of timeout and span size, which is the serialized span size in bytes. If counting the serialized span byte size is not possible on a platform, it can implement the alternative called weight, which is an approximation of how much bytes a span allocates. A section below explains weight in more detail.

rfc(feature): Batch Span Ingestion

4f3ab15

philipphofmann force-pushed the rfc/batch-span-ingestion branch from 0416442 to 4f3ab15 Compare November 28, 2023 08:01

first draft version

ffa6650

philipphofmann requested a review from markushi November 28, 2023 08:42

markushi reviewed Nov 28, 2023

View reviewed changes

text/0126-batch-span-ingestion.md Outdated Show resolved Hide resolved

text/0126-batch-span-ingestion.md Outdated Show resolved Hide resolved

philipphofmann and others added 6 commits November 29, 2023 16:53

Update text/0126-batch-span-ingestion.md

2ffb1a2

Co-authored-by: Markus Hintersteiner <markus.hintersteiner@sentry.io>

Update text/0126-batch-span-ingestion.md

c61495c

Co-authored-by: Markus Hintersteiner <markus.hintersteiner@sentry.io>

clarify finished spans

8b4b3e6

fixes

05f668c

add span children scenarios

916cb56

more fixes for child spans

c0aa686

philipphofmann requested a review from markushi November 30, 2023 12:45

ready for review

3f278a5

philipphofmann marked this pull request as ready for review December 6, 2023 11:34

minor update

874baaf

philipphofmann changed the title ~~rfc(feature): Batch Span Ingestion~~ rfc(feature): SDK Span Buffer Dec 6, 2023

Rename to SDK Span Buffer

a42f075

mention OTel and use case timeout to 0

537acc6

philipphofmann added 3 commits January 12, 2024 10:24

Add weights

c791c66

Add to use buffer in client

e34e514

add byte size

2851381

markushi approved these changes Jan 17, 2024

View reviewed changes

rename buffer to aggregator

a0fbe94

philipphofmann changed the title ~~rfc(feature): SDK Span Buffer~~ rfc(feature): SDK Spans Aggregator Jan 17, 2024

AbhiPrasad approved these changes Jan 17, 2024

View reviewed changes

philipphofmann merged commit 8b6597e into main Jan 19, 2024
2 checks passed

philipphofmann deleted the rfc/batch-span-ingestion branch January 19, 2024 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc(feature): SDK Spans Aggregator #126

rfc(feature): SDK Spans Aggregator #126

philipphofmann commented Nov 28, 2023 •

edited

Loading

markushi left a comment

AbhiPrasad commented Dec 7, 2023

philipphofmann commented Dec 15, 2023

philipphofmann commented Jan 3, 2024

AbhiPrasad commented Jan 10, 2024

philipphofmann commented Jan 12, 2024 •

edited

Loading

AbhiPrasad commented Jan 16, 2024

markushi left a comment

philipphofmann commented Jan 17, 2024 •

edited

Loading

rfc(feature): SDK Spans Aggregator #126

rfc(feature): SDK Spans Aggregator #126

Conversation

philipphofmann commented Nov 28, 2023 • edited Loading

markushi left a comment

Choose a reason for hiding this comment

AbhiPrasad commented Dec 7, 2023

philipphofmann commented Dec 15, 2023

philipphofmann commented Jan 3, 2024

AbhiPrasad commented Jan 10, 2024

philipphofmann commented Jan 12, 2024 • edited Loading

AbhiPrasad commented Jan 16, 2024

markushi left a comment

Choose a reason for hiding this comment

philipphofmann commented Jan 17, 2024 • edited Loading

philipphofmann commented Nov 28, 2023 •

edited

Loading

philipphofmann commented Jan 12, 2024 •

edited

Loading

philipphofmann commented Jan 17, 2024 •

edited

Loading