Skip to content
This repository has been archived by the owner on Apr 7, 2022. It is now read-only.

Monitor timeout due to random lags in State command #22

Closed
roger11 opened this issue Jun 13, 2016 · 4 comments
Closed

Monitor timeout due to random lags in State command #22

roger11 opened this issue Jun 13, 2016 · 4 comments

Comments

@roger11
Copy link

roger11 commented Jun 13, 2016

Hi to everyone,

let me introduce you to a problem we have encountered at ALBA synchrotron

Error
We have recently stumbled in the following error on our system:

DevFailed[
DevError[
    desc = Not able to acquire serialization (dev, class or process) monitor
  origin = TangoMonitor::get_monitor
  reason = API_CommandTimedOut
severity = ERR]

Problem description
It happens when we are executing DATA_READY events on an acquisition thread on the server side and, meanwhile, a client executes a command on the server.

Isolate scenario
As our setup was quite complex, we proceeded in creating a reduced one that consists on a server and a client.
Therefore, a zip file with a Server and a Client is attached: MonitorLockSerialization.zip

Lets review its components:

  • Server
    The server has three attributes, one that generates data_ready events, one that generate change events instead and finally a sleep_time variable.
    Both the attr that generate events start to do so when written. Their write method starts a thread that generates events waiting sleep_time seconds between each generation.
    To stop the event generation we have implemented a StopThread command that stops all threads.
    Finally, but not less important for it, we have implemented a command that makes sleeps (CommandSleep).
  • Client
    Our client simply receive a parameter and executes a method that, depending on the input parameter of the client, starts one of the events generation of the server, then it makes a subscription to the attr and, finally, it starts a loop that executes a command_inout of the device (the CommandSleep)

The setup is based in this scripts but to build it, it is necessary to:

  1. Start the server:
    Open a console and define the device as follows:
    $ tango_admin –add-server MonitorLockSerializationServer/LockTest MonitorLockSerialization test/monitor_lock/1
    Once the device has been defined, start it from the directory where its file is located with:
    $ python MonitorLockSerializationServer.py LockTest -v4
  2. Start the client:
    From another console of a host with the same TANGO_DB we run the script from the directory where it is located like:
    $ ./MonitorLockSerializationClient ChangeEvent
    if we want to test the setup with change events.
    $ ./MonitorLockSerializationClient DataReadyEvent
    if we want to test the setup with data_ready events instead.

Error generation
If we run the setup using the ChangeEvent parameter no error will happen and the system will work as expected.
If we run the setup using the DataReadyEvent parameter, after some loops (10 exactly, as the CommandSleep executes 10 times faster than the push data ready) the client will crash from a timeout executing the CommandSleep and the server will generate the Not able to acquire serialization monitor error. Playing with the sleeps time of the client we can force the problem to happen at the first loop (i.e. increasing the sleep time on the CommandSleep) but as it is now, the error happens quite fast either way.



That is all.
Any additional information required, do not hesitate to ask for it.

Many thanks,
Roger

@vxgmichel
Copy link
Contributor

Thanks for the report, that's a good ol' Monitor/GIL deadlock...

All push_X_event methods use a macro called SAFE_PUSH to release the GIL first and acquire the monitor lock then in a safe manner. All, except for push_data_ready_event that simply calls the C++ methods.

It should be quite easy to solve. Any volunteer?

@jairomoldes
Copy link
Contributor

jairomoldes commented Jul 12, 2016

I can have a look at this. Did anyone already solve it or can I go on?

@vxgmichel
Copy link
Contributor

No I don't think so, so you can go ahead. You can assign me to the PR if you want me to review it.

@jairomoldes
Copy link
Contributor

Fine. Thanks.

On Tuesday 12 July 2016 00:48:11 Vincent Michel wrote:

No I don't think so, so you can go ahead. You can assign me to the PR if you
want me to review it.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#22 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants