Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a disruptor based async reporter #81

Closed
codefromthecrypt opened this issue Dec 7, 2017 · 12 comments
Closed

Add a disruptor based async reporter #81

codefromthecrypt opened this issue Dec 7, 2017 · 12 comments

Comments

@codefromthecrypt
Copy link
Member

We can calculate the size of a span without serializing it. We should be able to make an async reporter that defers serialization, but still allows us to drop spans that are too big to send. It should be fun work, if someone's up to it.

We could also consider making this disruptor based, or otherwise with a fancier ringbuffer opt-in given a dep like https://github.com/real-logic/agrona.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 8, 2017

In SkyWalking, we put the Spans(actually Segment) into a ringbuffer, then consume it, finally send it. So I agree with you, defer serialization is a good idea.

For the ringbuffer, LMAX-Exchange/disruptor is really a good project. For SkyWalking, we use a array based queue, implement by ourself.

@codefromthecrypt codefromthecrypt changed the title defer serialization in the async reporter Add a disruptor based async reporter Dec 12, 2017
codefromthecrypt pushed a commit that referenced this issue Dec 12, 2017
We calculate a span size in bytes before offering it to the async queue
to ensure we don't add a poison message (a span too big to encode).
However, we don't have to encode it on the application calling thread.

This operation reduces overhead by about 18us in benchmarking.

End-to-end benchmarks from Brave, running on jdk9.01
```
Benchmark                               Mode  Cnt    Score   Error  Units
EndToEndBenchmarks.server_get           avgt   15  133.535 ± 5.378  us/op
EndToEndBenchmarks.unsampledServer_get  avgt   15  145.541 ± 2.757  us/op

Before:
Benchmark                               Mode  Cnt    Score   Error  Units
EndToEndBenchmarks.traced128Server_get  avgt   15  207.332 ± 3.377  us/op

After:
Benchmark                               Mode  Cnt    Score   Error  Units
EndToEndBenchmarks.traced128Server_get  avgt   15  189.703 ± 3.342  us/op
```

See #81
@codefromthecrypt
Copy link
Member Author

ignoring my other comment.. deferring is certainly worth doing, independent of this issue, which I've retitled to making things disruptor based #84

@codefromthecrypt
Copy link
Member Author

The related change out in zipkin-reporter 2.2 had much more impact in benchmark vs microbenchmark. Here's an example spring-cloud/spring-cloud-sleuth#808

@tramchamploo
Copy link

Well, I don't see any size restriction or overflow strategy in disruptor. So maybe using disruptor before ByteBoundedQueue layer, or reimplement it to behave just like a ringbuffer?

@codefromthecrypt
Copy link
Member Author

@tramchamploo I'm flexible, but before we make our own want to check something...

@mjpt777 are you aware of any size-based or otherwise bounded disruptor? We have means to calculate the size per item of things and would want to drop if what's currently unprocessed is at or above that

@mjpt777
Copy link

mjpt777 commented Jan 31, 2018

@adriancole In what dimensions are you referring to when you mention a "bounded disruptor"? Is this bounded backlog of items, or bounded size of individual entries, or some other bound? I think you are referring to bounded record size but want to be sure.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Jan 31, 2018 via email

@mjpt777
Copy link

mjpt777 commented Jan 31, 2018

If you are bounding in both backlog and record size then best to do that by limiting the overall bytes. There are a number of structures in Agrona for one-to-one and many-to-one ring buffers, and a broadcast buffer for one-to-many. You could also consider Aeron as it gives the best throughput and latency and can bound in both dimensions plus can can use in with shared memory or over the network. Aeron has many more useful features for this type of need with control over flow and congestion control if you are willing to investigate.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Jan 31, 2018 via email

@tramchamploo
Copy link

tramchamploo commented Mar 13, 2018

Another choice is JCTools used by Netty.

@codefromthecrypt
Copy link
Member Author

note: census uses a disruptor (they don't check bounds) it is prior art to consider

https://github.com/census-instrumentation/opencensus-java/blob/master/impl/src/main/java/io/opencensus/impl/internal/DisruptorEventQueue.java

I would caution that there have been a lot of small bugs about CPU which hopefully most are gone now. https://github.com/census-instrumentation/opencensus-java/search?q=disruptor&type=Issues

@codefromthecrypt
Copy link
Member Author

ps anything like this should be carefully reviewed to not cause more support issues than it solves. Multiple folks should ok the impl in a way they are happy to help support bugs in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants