-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zipking sampling percentage configuration #30
Comments
That's right. The zipkin library doesn't provide sampling in the way jaeger does. But I'd be happy to take a PR for it if it's something you'd like to work on. |
I'd really like to have support for this too- @rnburn I can try and take a look, do you have any suggestions about where you'd like to see support/how it would be implemented? I.e. perhaps some kind of Am I right in thinking that this should also set |
@pingles - yes that's correct it should set that sampled header. Additionally, it should look at the standard opentracing You might check out
And I think adding a Also, sampling is tracer-specific functionality, so there wouldn't be any opentracing interface to specify how the tracer should do it. Probabilistic sampling is only one particular strategy -- though probably the best one to start with. You can see Jaeger's documentation for examples of other ways to do it. |
Cool, thanks for the pointers, will try and see if I can get something
going!!
…On Sun, 13 May 2018 at 22:00, Ryan ***@***.***> wrote:
@pingles <https://github.com/pingles> - yes that's correct it should set
that sampled header.
Additionally, it should look at the standard opentracing sampling.priority
tag -- described here
<https://github.com/opentracing/specification/blob/master/semantic_conventions.md>
.
You might check out
- Jaeger's probabilistic sampler
<https://github.com/jaegertracing/jaeger-client-cpp/blob/master/src/jaegertracing/samplers/ProbabilisticSampler.h>
- LightStep' support for sampling.priority
<lightstep/lightstep-tracer-cpp#94>
And I think adding a sampling_rate field to this structure
<https://github.com/rnburn/zipkin-cpp-opentracing/blob/master/zipkin_opentracing/include/zipkin/opentracing.h#L6>
would be a good place to control the behavior. It would also need to be
added to the JSON configuration
<https://github.com/rnburn/zipkin-cpp-opentracing/blob/master/zipkin_opentracing/src/tracer_factory.cc#L8>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAEfgUNWe7jMKeeHXeVEq2SSTZ0c7QUks5tyJ7pgaJpZM4SkdQF>
.
|
I've started taking a look at this and trying to get my head around the interplay between the different modules. Looking at the Jaeger module, the determination of whether to propagate the sampling decision, or determine whether to sample is done inside Thanks @rnburn for your patience in helping me figure this out :) |
@pingles - yes, that would be a good place. |
Ok cool, I found https://github.com/opentracing/specification/blob/master/specification.md and wasn't sure whether sampling was something that's tracer specific (and on a |
I've started working on this and I've managed to (brutally) get it to do the right thing (at least from observing the downstream request headers): pingles/zipkin-cpp-opentracing@67d68c6 I'm going to try and carry on tomorrow- I need to fix everything that's referenced in the commit message, and additionally make sure that spans are only reported if they've been sampled (missed that out of my message). I'm certain that I've misunderstood the relationship between the classes so if anyone more familiar could have a scan and point out where I've made a mistake/should change something I'd really appreciate it. I'm by no means an expert C++ engineer either so please correct me on that also. Thanks again for supporting me to add this. |
Thanks @pingles - I went through and made a few suggestions. |
Awesome, thanks @rnburn- I'll make those changes tomorrow, appreciate the feedback. |
I've made a few more improvements- is there a recommended way to determine whether a span is the root or not? Looking at https://github.com/opentracing/opentracing-cpp/blob/master/include/opentracing/tracer.h#L39 suggests that if |
@rnburn I've copied the Jaeger implementation for determining whether a span is the root (checking whether references contains a Unfortunately, it doesn't seem to work when I test it inside the nginx module- passing sample headers with curl:
I have a dummy app running that's fronted by nginx and prints the incoming request headers and it shows:
So... close, but not quite enough :) I'll keep debugging to try and figure out where the missing chain is but if anyone else spots it before me please let me know! All work is in https://github.com/pingles/zipkin-cpp-opentracing/tree/sampling currently. |
Related to the above- the other B3 headers are propagated correctly so it's either an issue with the sampled header specifically, or, something to do with my implementation of: bool
hasParent(const ot::StartSpanOptions &options) const {
for (auto ref : options.references) {
if (ref.first == ot::SpanReferenceType::ChildOfRef) {
return true;
}
}
return false;
} |
@pingles - I think the parseBool function for reading the sampling header that I wrote might be the problem. (Sorry) |
Also, does
have a space in front of the header values? that might not work for propagation. |
@rnburn Ah... so
So that should change to:
|
Yes, that looks right. I think it's also hard-coded for inject: https://github.com/rnburn/zipkin-cpp-opentracing/blob/master/zipkin_opentracing/src/propagation.cc#L67 |
Awesome, I’ll try both of those a little bit later and let you know how I
get on.
…On Thu, 17 May 2018 at 18:20, Ryan ***@***.***> wrote:
Yes, that looks right. I think it's also hard-coded for inject:
https://github.com/rnburn/zipkin-cpp-opentracing/blob/master/zipkin_opentracing/src/propagation.cc#L67
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAEfmbfDgdSJym5NgKr8sTp5jF_PNLjks5tzbFhgaJpZM4SkdQF>
.
|
Ok, made both of those changes but still no cigar... I tried:
I've pushed the changes again to my repository. rnburn/zipkin-cpp-opentracing@master...pingles:sampling shows current progress ;) I suspect now the issue is the behaviour in if (hasParent) {
span->setSampled(samplingState(references));
} else {
span->setSampled(sampler.ShouldSample());
} At least that's pretty close to the Jaeger module's equivalent. @rnburn does that sound right- that the job of the tracer when creating a span would be to check the referenced context and extract information from there? Am I also right to assume that, assuming the nginx module is using zipkin, any |
I've also just seen that |
I've done a few more tests and it appears that it's nondeterministic when incoming context information is missing some fields. When I was using curl and just setting I'll try and dig more into that before finally verifying that the sample decision is used to determine whether to send the span data to zipkin. |
I'm an idiot- just seen there's some nice tests to build on, will start there to figure this out. |
Writing some tests was super helpful (although they're a bit ugly- was tricky to control the behaviour of the sampler once the tracer had been constructed). @rnburn would you be able to have another review through the changes please- any style changes etc. are obviously welcomed also rnburn/zipkin-cpp-opentracing@master...pingles:sampling I'm still having issues getting it to behave correctly via this nginx module but the tests make me feel happier that I'm close :) |
For the souldSample function, you don't want to be doing initializing every time like this -- that's going to be slow (see the notes in the example code here). Instead could you add a function to utility.cc and reuse this rand_source? (you can move the variable to be static to the translation unit instead of the function). |
@rnburn thanks. Is it fair to store them as members instead- and then it's only initialised when the sampler is contructed (when the tracer is)?
Or is it substantially better to use static? Also- is there any concern around thread-safe concurrent requests for numbers from the distribution, does it need to be guarded with some kind of mutex? |
Having read around it seems like this may be preferable:
|
Yeah, that's what's done here: https://github.com/rnburn/zipkin-cpp-opentracing/blob/master/zipkin/src/utility.cc. I was thinking add something like randomBool(double p) so that you can reuse that same rand_source variable that's already initialized. But I think what you have is ok too. |
Ok cool, I'll go with that for now :-) I think there's two things left to do: 1) make sure that spans aren't reported unless they're sampled, 2) figure out why the randomisation doesn't seem to work from the nginx module (pingles@e7b586c). Current behaviour is that whatever sampling decision is forwarded from the client is propagated correctly. However, without any existing context headers the tracer always sets spans to be unsampled. So very nearly there :-) |
Oh, and check the |
I've updated the lib to only report sampled spans and am now debugging the issue with all spans being unsampled when used by the nginx module. I've turned on the debug log and can see the following (with
It feels like there's something I've misunderstood in the way that spans are constructed around the request- that either the sampler is never called, or that it's not randomly drawing from the distribution. I don't suppose you have any good suggestions on where to look @rnburn please before I start debug logging everything |
Note to myself: it feels like it might be a case of checking whether a parent span is valid, not just that it exists: https://github.com/jaegertracing/jaeger-client-cpp/blob/master/src/jaegertracing/SpanContext.h#L150 |
I think everything in the underlying zipkin opentracing lib is now updated to support probabilistic sampling from within the nginx module (aside from propagating information through the tags also). However, I'm still having issues using it within the nginx module: currently it looks like the sampling rate is incorrectly parsed- so ~1/10 requests are sampled when the sampling probability ought to be 1/1. I think I'll try and get the tag stuff completed on the zipkin lib and then open a PR as I'm pretty sure that is good to go- it'd be good to get some more feedback on the code to make sure I've not missed/misunderstood anything. After that I'll get back on trying to figure out what's happening within the nginx module. Current code is in: https://github.com/pingles/zipkin-cpp-opentracing/tree/sampling |
thanks @pingles - let me know when you're ready, and I can take another look at it. you might open up the PR, even if you're not finished - it will make it easier to review. |
Perfect- I'll just do that then |
I've just created the PR: rnburn/zipkin-cpp-opentracing#20 |
Currently there is no way (please correct me if I wrong) to specify sampling percentage for
Zipkin
tracer. This renders zipkin support not so usable on high load prod environments, as amount of traces generated is more than actual traffic.Would it be possible to provide sampling param same as for eg. jaeger?
The text was updated successfully, but these errors were encountered: