-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add throughput performance tests for OTLP exporter #1491
Add throughput performance tests for OTLP exporter #1491
Conversation
dd5fe0b
to
bc208bb
Compare
Looks like |
Thanks for catching the pypy issue @NathanielRN, looks like it was addressed and a new release went out an hour ago, re-running the jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good, I'd consider re-designing the batch exporter benchmark slightly, depending on what columns of pytest-benchmark you want to be accurate.
) | ||
span.end() | ||
|
||
benchmark(create_spans_to_be_exported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this uses pytest-benchmark. In that case, I would note that the table that it outputs here will show you the average load per create_spans_to_be_exported call.
In this case, one particular call will be much more expensive, since the batch export thread will activate and consume a lot of CPU to process them all.
Overall it should show up in the average measurement, just saying that you have to watch out for others like min / max. those will be very misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is using pytest-benchmark
! That's really insightful, thanks I didn't know! Sounds good, even our performance tests graph uses the "mean" measurement to display the "# of iterations"
def test_batch_span_processor(benchmark): | ||
tracer = get_tracer_with_processor(BatchExportSpanProcessor) | ||
|
||
def create_spans_to_be_exported(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be complete, you can run force_flush.
It'll require some calculation, but you may be better off re-writing this benchmark to run a combination of 10k create_spans_to_be_exported, followed by a force_flush. This ensures a consistent benchmark, rather than the batch span processor running when it needs to, which could be different from run to run.
But it looks like max_queue_size on the batch span exporter will force flushes anyway. (default 2048)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your super helpful comments as always :)
I thought a lot about what you said, and it makes sense to me, but I also think it makes sense to keep these BatchExportSpanProcessor
tests as is with only 1 span created every time.
I noticed that when I changed it to "create 2048 spans and then export" you could no longer compare the "Processors" because the Batch Processor could only run 5 times (but it exported 2048 * 5 = 10,240 spans) while the Span Processor ran 134 times. Further, I reasoned that if that 2048 * 4th span finished right before the 1 second limit by pytest-benchmark
, it would run the function again and add another 2048 spans to the benchmark when really the test should have been over by that point (and it is for the SimpleExportSpanProcessor
).
It's true that the "time-to-export" for the BatchExportSpanProcessor
won't be consistent, but when I ran it locally it never got so inconsistent that it seemed to produce crazy results. Over my trials, I got the following # of spans completed in 1 second:
Trial 1: 8036
Trial 2: 7282
Trial 3: 7965
Trial 4: 7642
Trial 5: 7871
I'm not sure when the Processor decides to export, but I think what's great about benchmarking 1 single span creation allows us to see how much better it is than the SingleExportSpanProcessor
here over the same trials:
Trial 1: 138
Trial 2: 134
Trial 3: 134
Trial 4: 129
Trial 5: 139
This means the BatchExportSpanProcessor
never regresses so badly in its "time-to-export" decisions that it gets slow like the SimpleExportSpanProcessor
which exports on every span.
Once we have time to include Self-hosted runners I would expect its decisions to get consistent, but even so, these tests will make sure that regressions in the Processor "time-to-export" decisions don't go unnoticed :)
TL;DR I think it's better for both comparisons against other Processors and accuracy for # of spans finished during the 1s benchmark to leave the test as is. This is because even the "Batch" tests should be consistent enough to not have sharp changes unless the algorithm changes.
sounds good! Looking at the server stub it looks pretty comprehensive expect for drain from the channel, so should reproduce CPU load pretty well.
Left a comment. I presume you refer to BatchExportSpanProcessor, as SimpleExportSpanProcessor calls export on every call. |
448a643
to
6c182c1
Compare
@NathanielRN any thoughts on what's happening w/ CI here? it doesn't look like it's passing any of the checks |
@codeboten I'm not sure what's going on, I've tried force pushing it multiple times just to get the tests to run. It's in a permanent "queued" state neither failing nor passing the tests :/ The only thing I can think of is that I changed the |
a408146
to
e8e9d5d
Compare
6c6781c
to
84c6ebc
Compare
84c6ebc
to
302432b
Compare
@codeboten I've fixed the tests finally! This should be ready to merge :) |
Description
According to the Performance testing Spec, we should have performance tests to test the throughput of the OTLP Exporter.
In the Java equivalent tests, they actually run the Collector.
However, for these tests, I thought it would be sufficient to turn the gRPC export command into a No-op. We can then say that these tests test how much the SDK side exporter can export in a second.
I think it is possible that some spans won't be exported before the benchmark finishes, but it looks like that was not a concern for the Java tests, should we be concerned with that?
Type of change
How Has This Been Tested?
tox -e test-exporter-otlp
runs these benchmarking testsDoes This PR Require a Contrib Repo Change?
Checklist:
- [ ] Changelogs have been updated- [ ] Unit tests have been added- [ ] Documentation has been updated