-
Notifications
You must be signed in to change notification settings - Fork 126
Conversation
Signed-off-by: Ryan Burn <ryan.burn@gmail.com>
Signed-off-by: Ryan Burn <ryan.burn@gmail.com>
|
||
static uint64_t computeSamplingBoundary(long double samplingRate) | ||
{ | ||
const long double maxRandNumber = kMaxRandomNumber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. As you have pointed out, long double
is not a great solution. There are only two fixes I would really consider. The first, use a library to handle this sort of thing such as Google's double-conversion. I'd hate to add more dependencies just over this operation. The second possible solution is to reconsider how we sample the traces and use a new random number as opposed to the trace ID. The only issue with that is currently we have a solution that is somewhat repeatable across all clients. The same trace ID is sampled or not sampled no matter which language is being used, etc. But using a new random number means we cannot guarantee that a trace will be sampled across the board. Not sure how important this is, but something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification - the trace ID is used for sampling in the Go client merely as a performance optimization. There's no functional dependency on doing it that way. In the Node.js client we do use a new random number for the sampling decision since Node cannot do uint64 arithmetics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Makes sense. That is okay with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isaachier This does not require long double
to be 80-bit to work.
All you need to handle the overflow is a conditional like this.
long double
will still give you a better answer if extended precision is available, but it will still work if long double
== double
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be >=
. And you wrote yourself that using a standard random number generator would make more sense here. Do you no longer feel that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since samplingRate <= 1.0
, samplingRate * maxRandNumber
is always less than or equal to maxRandNumber
; so it's not necessary.
Using std::uniform_real_distribution
would be straightforward, but I don't see anything wrong with putting in a small fix to immediately address the overflow issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK got it. Overall, I hate dealing with overflow, but the random number generator involves changing a number of unit tests as well. So I am happy to accept this, but if you can (again), please merge from master and push so we can see coverage info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged master.
Codecov Report
@@ Coverage Diff @@
## master #13 +/- ##
==========================================
- Coverage 85.44% 85.43% -0.01%
==========================================
Files 93 93
Lines 2219 2225 +6
Branches 202 203 +1
==========================================
+ Hits 1896 1901 +5
Misses 252 252
- Partials 71 72 +1
Continue to review full report at Codecov.
|
In my experience very hard to write a test case for overflow so just going to accept this despite coverage decrease. Thanks for the update. |
* hardcoded sampling rate of 50% * sampling decision should only be performed at root or honour parent span decision * allow more than probabilistic strategy for sampling * pretty sure i've not understood the model, code tidying to come
This is a way of fixing #6 that keeps the same sampling logic. If
long double
is an 80-bit floating point type (which it is on most platforms), it will determine the sampling boundary in a more accurate way where overflow wouldn't be a problem; but it also adds a conditional to protect against overflow in caselong double
is a standard 64 bit floating point type.