Skip to content
This repository has been archived by the owner on Jul 8, 2022. It is now read-only.

zmq_abort while executing Init command #592

Closed
gscalam opened this issue Oct 18, 2019 · 6 comments
Closed

zmq_abort while executing Init command #592

gscalam opened this issue Oct 18, 2019 · 6 comments
Labels

Comments

@gscalam
Copy link

gscalam commented Oct 18, 2019

Executing Init command on a specific Device Server I have occasionally a zmq_abort:

Operation not permitted (../../src/epoll.cpp:109)

Program received signal SIGABRT, Aborted.
[Switching to Thread 857928896 (LWP 5046)]
0x0e9ef41c in raise () from /lib/libc.so.6
(gdb) bt
#0  0x0e9ef41c in raise () from /lib/libc.so.6
#1  0x0e9f109c in abort () from /lib/libc.so.6
#2  0x0ed9f35c in zmq::zmq_abort () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#3  0x0ed9cb54 in zmq::epoll_t::set_pollout () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#4  0x0eda071c in zmq::io_object_t::set_pollout () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#5  0x0edda618 in zmq::stream_engine_t::restart_output () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#6  0x0edc7ffc in zmq::session_base_t::read_activated () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#7  0x0edb91e4 in zmq::pipe_t::process_activate_read () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#8  0x0edb21d4 in zmq::object_t::process_command () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#9  0x0eda115c in zmq::io_thread_t::in_event () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#10 0x0ed9c7e4 in zmq::epoll_t::loop () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#11 0x0ed9c8f8 in zmq::epoll_t::worker_routine () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#12 0x0ede42bc in thread_routine () from /usr/local/zeromq-4.0.8/lib/libzmq.so.4
#13 0x0e960b14 in start_thread () from /lib/libpthread.so.0
#14 0x0ea9a7d4 in clone () from /lib/libc.so.6

It is difficult to reproduce and I don't know if it can be related to other known issues, but I can just add that this device server:

  • make use of push_change_event and push_archive_event on some attributes
  • has polling with event thresholds configured on other attributes
  • has memorized attributes
  • has alarm thresholds configured on some attributes
  • ...
@bourtemb
Copy link
Member

Hi @gscalamera , thanks for creating this issue.
Can you tell us which ZMQ versions you are using?

@gscalam
Copy link
Author

gscalam commented Oct 18, 2019

4.0.8 compiled from sources

@bourtemb
Copy link
Member

Could you please try to use a debug version of the zmq library?
At first sight, it looks like epoll_ctl command which is executed in zmq::epoll_t::set_pollout(handle_t handle_) method returns an error (epoll.cpp:108).
If would be interesting to know what is the error code you get when this happens.
If you do a man epoll_ctl, you will see that there are many potential different errors when invoking this method with EPOLL_CTL_MOD.

One of them is ENOMEM (There was insufficient memory to handle the requested op control operation). Did you notice any memory issues on your computer?
Some others potential errors are related to invalid file descriptors... and there are other potential issues.

In your case, you get the problem when you invoke the INIT command on a device of your device server? Is this correct? You were not doing a "restart device (DevRestart)" or "restart server (RestartServer)"?

Is your device server subscribing to events from other devices?

@gscalam
Copy link
Author

gscalam commented Oct 18, 2019

I'll try to reproduce with the debug version.
No problem with memory, problem arises with Init command, not tested with restart.
The code of this device server is frozen since many years, we started seeing crashes with the Init command recently after upgrading from Tango 8.1.2.c to 9.3.3.
The device server is not subscribing to events from other devices.

@bourtemb bourtemb changed the title zmq_abort executing Init command zmq_abort while executing Init command Oct 18, 2019
@bourtemb
Copy link
Member

Hi @gscalamera, no update on this issue?
Did you manage to reproduce it with the ZMQ debug version?

@gscalam
Copy link
Author

gscalam commented Nov 15, 2019

Not able to reproduce it anymore.
Since this device has many clients (c++, python, matlab, ...) distributed over the facility, possibly with different version of Tango (8, 9, ..), could it be related with some client in a bad state?
It looks like we stopped having crashes after restarting all the clients we found looking at the black box.

@t-b t-b closed this as completed Nov 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants