-
Notifications
You must be signed in to change notification settings - Fork 34
Fix race conditions between polling threads and user threads pushing events #641
Fix race conditions between polling threads and user threads pushing events #641
Conversation
b1151f3
to
a8f07bf
Compare
Hi @gscalamera and @lorenzopivetta. This is PR which should correct #511. It contains necessary changes extracted from #635 and refactored. Could you please try to apply this patch and see if it solves the crash in your environment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm attaching an abi-compliance report I've generated using the script attached to #613. So strictly speaking this is not ABI/API compliant. Not a big suprise as you remove symbols and propagate LastAttrValue from POD to class.
compat-report.tar.gz
That report also misses the removal of EventSupplier::detect_mutex
which g++ 9.2 found.
Now I'm also aware that this PR is a guinea pig for the check so I think it is unfair to just reject the current solution.
The really strict solution would be to leave LastAttrValue as it is, implement its store member function as free function and keep EventSupplier::detect_mutex
as well. But one could also argue that nobody outside tango itself should use LastAttrValue
or inherit from EventSupplier
and therefore the API/ABI breakage is not a problem. Thoughts?
Codewise: Looks good and the commit messages also explain what is done. One minor enhancement could be done though.
Replacing
if (except
|| quality == Tango::ATTR_INVALID
|| ((! except) && prev_change_event.err)
|| (quality != Tango::ATTR_INVALID && old_quality == Tango::ATTR_INVALID))
with
if (except
|| quality == Tango::ATTR_INVALID
|| prev_change_event.err
|| old_quality == Tango::ATTR_INVALID)
as short-circuting is used anyway.
Thanks a lot Thomas for a detailed review and analysis! |
@mliszcz Sounds good. |
Hi @mliszcz . Applied to 9-lts branch and compiled on PPC (requires tricking some configs). Need to wait for @gscalamera for testing. |
We encountered the same issue #511 in one of our device servers here at the ESRF which was crashing several times a day. |
Fixes data race between: * polling thread (EventSupplier::detect_and_push_xxx_event) * and user thread pushing events (Attribute::fire_xxx_event)
Remove detect_mutex and protect detect_change method with event_mutex to synchronize access to the old attribute value. Fixes data race between: * polling thread (EventSupplier::detect_and_push_xxx_event) * and user thread pushing events (EventSupplier::detect_change)
Fixes data race between: * omni worker thread (DServer::event_subscription) * and user thread pushing events (Attribute::fire_xxx_event)
a8f07bf
to
ee262d4
Compare
Redundant checks are removed to simplify the condition for enforcing sending of change event. New condition is equivalent to the original one. This is non-functional change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mliszcz Nice!
Hi @lorenzopivetta and @gscalamera - any news from the test run? @bourtemb or @Ingvord - I think that at least one of you needs to review this the PR before it can be merged. There is also a backport #665, quite similar to this PR so you can have a look as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @mliszcz! Thanks!
Still no crash on our side since Tuesday with this patch.
Running since almost 24 hours without any issue. I would say it's OK. |
Two approvals, no complains so I'll merge the PR. Thanks to everyone involved in the review and testing! |
Changes proposed in comment in another PR: #635 (comment)