-
Notifications
You must be signed in to change notification settings - Fork 44
Monitor timeout due to random lags in State command #22
Comments
Thanks for the report, that's a good ol' Monitor/GIL deadlock... All It should be quite easy to solve. Any volunteer? |
I can have a look at this. Did anyone already solve it or can I go on? |
No I don't think so, so you can go ahead. You can assign me to the PR if you want me to review it. |
Fine. Thanks. On Tuesday 12 July 2016 00:48:11 Vincent Michel wrote:
|
Hi to everyone,
let me introduce you to a problem we have encountered at ALBA synchrotron
Error
We have recently stumbled in the following error on our system:
Problem description
It happens when we are executing DATA_READY events on an acquisition thread on the server side and, meanwhile, a client executes a command on the server.
Isolate scenario
As our setup was quite complex, we proceeded in creating a reduced one that consists on a server and a client.
Therefore, a zip file with a Server and a Client is attached: MonitorLockSerialization.zip
Lets review its components:
The server has three attributes, one that generates data_ready events, one that generate change events instead and finally a sleep_time variable.
Both the attr that generate events start to do so when written. Their write method starts a thread that generates events waiting sleep_time seconds between each generation.
To stop the event generation we have implemented a StopThread command that stops all threads.
Finally, but not less important for it, we have implemented a command that makes sleeps (CommandSleep).
Our client simply receive a parameter and executes a method that, depending on the input parameter of the client, starts one of the events generation of the server, then it makes a subscription to the attr and, finally, it starts a loop that executes a command_inout of the device (the CommandSleep)
The setup is based in this scripts but to build it, it is necessary to:
Open a console and define the device as follows:
$ tango_admin –add-server MonitorLockSerializationServer/LockTest MonitorLockSerialization test/monitor_lock/1
Once the device has been defined, start it from the directory where its file is located with:
$ python MonitorLockSerializationServer.py LockTest -v4
From another console of a host with the same TANGO_DB we run the script from the directory where it is located like:
$ ./MonitorLockSerializationClient ChangeEvent
if we want to test the setup with change events.
$ ./MonitorLockSerializationClient DataReadyEvent
if we want to test the setup with data_ready events instead.
Error generation
If we run the setup using the ChangeEvent parameter no error will happen and the system will work as expected.
If we run the setup using the DataReadyEvent parameter, after some loops (10 exactly, as the CommandSleep executes 10 times faster than the push data ready) the client will crash from a timeout executing the CommandSleep and the server will generate the Not able to acquire serialization monitor error. Playing with the sleeps time of the client we can force the problem to happen at the first loop (i.e. increasing the sleep time on the CommandSleep) but as it is now, the error happens quite fast either way.
That is all.
Any additional information required, do not hesitate to ask for it.
Many thanks,
Roger
The text was updated successfully, but these errors were encountered: