Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
crimson/mgr: don't report if there is no connection available.
During a teuthology run [1] following crash happended: ``` rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696$ less remote/smithi052/log/ceph-osd.3.log.gz ... DEBUG 2021-04-08 10:32:58,548 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62168 >> mon.0 v2:172.21.15.52:3300/0] <== #3 === mgrmap(e 4) v1 (1796) INFO 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closing: reset no, replace no DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] TRIGGER CLOSING, was READY INFO 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] execute_ready(): protocol aborted at CLOSING -- std::system_error (error crimson::net:4, read eof) DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closed! Segmentation fault on shard 0. Backtrace: 0x000000000151765c 0x00000000014d9600 0x00000000014d9902 0x00000000014d9972 /lib64/libpthread.so.0+0x0000000000012b1f 0x0000000000e59cba 0x00000000014dc8a6 0x00000000014cdd1c 0x0000000001503053 0x000000000149fab7 0x00000000006e0ef5 /lib64/libc.so.6+0x00000000000237b2 0x000000000072a23d daemon-helper: command crashed with signal 11 ``` [1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696/ GDB testifies the `conn` during the execution of `ceph::mgr:report()` was null: ``` (gdb) frame 7 154 in /usr/src/debug/ceph-17.0.0-2935.g4153f8c2.el8.x86_64/src/crimson/mgr/client.cc (gdb) print conn $1 = {_b = 0x0, _p = 0x0} ``` Taken altogether with the `mgr.4100 v2:172.21.15.52:6800/30259] closed!` debug this suggests that a call to `report()` occurred (likely from the timer) but we were in the middle of the unatomic reconnect sequence: ```cpp seastar::future<> Client::reconnect() { if (conn) { conn->mark_down(); conn = {}; } // ... return seastar::sleep(a_while).then([this] { // ... conn = msgr.connect(peer, CEPH_ENTITY_TYPE_MGR); }); } ``` This commit alters the `mgr::report()` to skip reporting is the `conn` is unavailable. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
- Loading branch information