-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc(feature): SDK Spans Aggregator #126
Conversation
0416442
to
4f3ab15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one, I left a few comments!
Co-authored-by: Markus Hintersteiner <markus.hintersteiner@sentry.io>
Co-authored-by: Markus Hintersteiner <markus.hintersteiner@sentry.io>
One high level thing is if you had the chance to look at https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md? Instead of span count, does it make sense to use byte size as a weight like we are doing for metrics aggregation? Should this be an implemented on the client or the transport? |
@AbhiPrasad, thanks for the link to OTel. I should also look for existing concepts in OTel for every feature we build. The Span Buffer is similar to the Batch Processor of OTel. I added the option to set the timeout to 0 to this RFC. Do you think we should align our Span Buffer more with the OTel Batch Processor? If I'm correct, we must send span children in the same envelope as their parents. So that's a requirement the OTel Batch Processor doesn't specify. Therefore, I think having our version of Span Buffer specific to spans is okay. Furthermore, I don't think we have needs for
How do you count the byte sizes of the objects? Do you serialize them and keep track of the byte size? I think that's a great idea because you don't care about how many spans you keep in memory, but more about how much memory they use.
As the Span Buffer must only accept spans after sampling and any other processing that could strip out data and keeping track of the byte size is easier after serializing, I would vote for the transport. If you agree, I'm going to update the RFC. |
@AbhiPrasad, please have a look at #126 (comment). |
We don't necessarily have the requirement that span children are sent in the same envelope as their parents, otherwise we run into the same issue with transactions (buffering a bunch of stuff in memory). I do think having our own specific spec for a span buffer is makes sense though, and that
so I was actually wrong about this, we don't keep the exact byte size, but instead an approximate weight based on type of metric/what it contains. We can try a similar heuristic as well, just weight by different properties on the span.
Transport sounds like the correct decision to me! |
I thought that was a requirement. @phacops, can you confirm that span children don't need to be sent in the same envelope as their parents?
Okay, so we could put weights on different properties and sum them. For example, a span with only simple properties has a weight of 10. Then, each tag again adds 1 weight, each data object 5, etc. I see that Python has that https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/metrics.py#L114-L118. I will come up with a spec. Update 1 I think we could rename the SpanBuffer to Update 2 |
That matches my mental model, let's do it!
Hmm I think this varies based on platform, not everyone can serialize, maybe we make this requirement flexible depending on sdk? So SDK can decide on weighting methodology? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements, looks good to me!
@AbhiPrasad, this already is in the RFC:
|
This RFC aims to find a strategy for batching spans together to avoid sending one envelope per span.
Rendered RFC