Skip to content
This repository has been archived by the owner on Jul 8, 2022. It is now read-only.

Fix issue where unsubscribing in push_events led to API_EventTimeout … #699

Merged
merged 2 commits into from
Apr 23, 2020

Conversation

bourtemb
Copy link
Member

…(#686)

Subscribing or unsubscribing events in an event callback when this callback
was called during a subscribe_event() phase was leading to events and
heartbeat events no longer received by the event client.
A bug was fixed when several ZMQ_DELAY_EVENT tasks were received consecutively
by the ZMQ control so.

Comment on lines +1086 to +1090
old_poll_nb = poll_nb;
poll_nb = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these two variables be completely removed and replaced with just one boolean flag (e.g. this->delay_events = true;)?
Then it could be possible to change zmq::poll(items,nb_poll_item,-1); into:

const int ALL_SOCKETS = 3;
const int ONLY_CONTROL_SOCKET = 1;
zmq::poll(items, delay_events ? ONLY_CONTROL_SOCKET : ALL_SOCKETS, -1)

I think this bug would never appear in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to do something like that for tango >= 9.4 (in tango-9-lts branch) because this implies dropping support for multicast events.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! I was not aware that this will impact multicast events (I am not familiar with how those are implemented).

@@ -1075,8 +1075,17 @@ bool ZmqEventConsumer::process_ctrl(zmq::message_t &received_ctrl,zmq::pollitem_

case ZMQ_DELAY_EVENT:
{
old_poll_nb = poll_nb;
poll_nb = 1;
// If poll_nb == 1, then we are already in a situation where events are being delayed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this double-delay scenario, shouldn't we count how many ZMQ_DELAY_EVENTs were received and wait for as many ZMQ_RELEASE_EVENTs?
Can we run into an issue when e.g. two threads create DelatEvent object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I agree with you it would make sense to count the number of received ZMQ_DELAY_EVENTS and really release once we have received as many ZMQ_RELEASE_EVENTS as ZMQ_DELAY_EVENTS.
There might be other issues if we don't do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mliszcz , I implemented this counter in a5d2c80
In cppTango >= 9.4, we could eventually make this variable a private member of ZmqEventConsumer class.
I didn't do it here to preserve the binary compatibility.
For the moment, this variable is useful/used only in ZmqEventConsumer::process_ctrl() method and the ZMQEventConsumer object is a singleton so it should be fine like this.
Could you please have a look?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is exactly what I had in mind.

bourtemb added 2 commits April 8, 2020 17:44
…ango-controls#686)

Subscribing or unsubscribing events in an event callback when this callback
was called during a subscribe_event() phase was leading to events and
heartbeat events no longer received by the event client.
A bug was fixed when several ZMQ_DELAY_EVENT commands were received
consecutively by the ZMQ control socket.
@bourtemb bourtemb force-pushed the fix-686-9.3-backports branch from cd2f911 to a5d2c80 Compare April 8, 2020 15:44
@reszelaz
Copy link

Hi all,
I just tested this PR with the test_dev_gc macro repeating it 1000 times and it worked correctly. While without this PR it does not survive 20 repeats. I have not tried it with the sardanatestsuite yet but it already looks great! Many thanks!!

@reszelaz
Copy link

sardanatestsuite also works with this PR. Thanks a lot!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants