-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Bytes & BytesMut compatible with ThreadSanitizer #405
Conversation
@@ -1046,7 +1046,7 @@ unsafe fn release_shared(ptr: *mut Shared) { | |||
// > "acquire" operation before deleting the object. | |||
// | |||
// [1]: (www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html) | |||
atomic::fence(Ordering::Acquire); | |||
(*ptr).ref_cnt.load(Ordering::Acquire); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will force a second load of the ref_cnt
, won't it?
Also, if we diverge from how std::arc::Arc
does things, we should probably have a good reason and explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there will be an additional load when the reference count when counter reaches zero.
I added a comment explaining why there is a load instead of a fence, so there is no need to go through git history to understand that.
To make it clear, the only reason for doing it is compatibility with ThreadSanitizer. The std::sync::Arc
is currently implemented as proposed here, although impl is used conditionally. In overall tokio ecosystem, this is last remaining false positive I had seen reported with ThreadSanitizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, they've added this conditional compilation: https://github.com/rust-lang/rust/blob/f844ea1e561475e6023282ef167e76bc973773ef/src/liballoc/sync.rs#L43-L58
Could we do similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg(sanitize = "thread")
is unstable so probably not. I don't think it is worth the complexity anyway.
On x86 this approach avoids a single mov from a location that was written a few instructions before, on a cold path that needs to do a bunch of work to deallocate the memory. Last time I looked this was essentially unmeasurable for any real world applications. For weak memory models where acquire fence leads to actual codegen, the situation is even more ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use cfg(bytes_ci_tsan)
or something, setting CARGO_CFG_BYTES_CI_TSAN=1
environment variable.
It seems it's at least worth enough that the standard library does it. Are they wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think approach from std is best idea. Requiring a custom cfg would be impractical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading a bit about the change in libstd (rust-lang/rust#65097), I think we should care about the performance here, and only enable a load instead of a fence via a conditional config, to be used with tsan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is measurable outside microbenchmarks. Note that the most of that discussion is about different implementation.
If doing this conditionally is the only acceptable implementation, then I suggest closing this. Doing this conditionally serves no purpose, because if it doesn't work out of the box, it doesn't work period. The situation in std is different, because cfg is automatically enabled when tsan is used during compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify one point: there is no measurable impact on any benchmarks here and they do exercise this code. If you noticed anything raised your concerned I can take look again, but honestly, this change does not make any difference.
Replace atomic fences with atomic loads for compatibility with ThreadSanitizer.
@@ -1046,7 +1046,7 @@ unsafe fn release_shared(ptr: *mut Shared) { | |||
// > "acquire" operation before deleting the object. | |||
// | |||
// [1]: (www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html) | |||
atomic::fence(Ordering::Acquire); | |||
(*ptr).ref_cnt.load(Ordering::Acquire); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading a bit about the change in libstd (rust-lang/rust#65097), I think we should care about the performance here, and only enable a load instead of a fence via a conditional config, to be used with tsan.
Replace atomic fences with atomic loads for compatibility with ThreadSanitizer.