Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown while there are live exchanges will crash #20880

Closed
bzbarsky-apple opened this issue Jul 18, 2022 · 4 comments · Fixed by #34035
Closed

Shutdown while there are live exchanges will crash #20880

bzbarsky-apple opened this issue Jul 18, 2022 · 4 comments · Fixed by #34035
Labels
p1 priority 1 work security V1.0

Comments

@bzbarsky-apple
Copy link
Contributor

Problem

During shutdown, we shut down the exchange manager, then the session manager.

Session manager shutdown evicts all sessions.

If there are exchanges waiting on a response timer, this will try to cancel the response timer, which will crash because we already shut down the exchange manager and that code is doing:

    System::Layer * lSystemLayer = mExchangeMgr->GetSessionManager()->SystemLayer();

so we null-deref mExchangeMgr->GetSessionManager().

Stack:
2022-07-18T16:08:12.0943970Z 2022-07-18 16:08:12.092 INFO    ThreadSanitizer:DEADLYSIGNAL
2022-07-18T16:08:12.1054250Z 2022-07-18 16:08:12.093 INFO    �[1m�[31m==49711==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000000018 (pc 0x00010e546d22 bp 0x7ffee20750e0 sp 0x7ffee20750c0 T352348)
2022-07-18T16:08:12.1141960Z 2022-07-18 16:08:12.093 INFO    �[1m�[0m==49711==The signal is caused by a READ memory access.
2022-07-18T16:08:12.1226540Z 2022-07-18 16:08:12.093 INFO    ==49711==Hint: address points to the zero page.
2022-07-18T16:08:17.2819920Z 2022-07-18 16:08:17.279 INFO        #0 chip::SessionManager::SystemLayer() SessionManager.h:185 (chip-tool:x86_64+0x1009bed22)
2022-07-18T16:08:17.2821210Z 2022-07-18 16:08:17.279 INFO        #1 chip::Messaging::ExchangeContext::CancelResponseTimer() ExchangeContext.cpp:428 (chip-tool:x86_64+0x100aaee32)
2022-07-18T16:08:17.2821910Z 2022-07-18 16:08:17.280 INFO        #2 chip::Messaging::ExchangeContext::OnSessionReleased() ExchangeContext.cpp:392 (chip-tool:x86_64+0x100aafef5)
2022-07-18T16:08:17.2822650Z 2022-07-18 16:08:17.280 INFO        #3 chip::SessionHolderWithDelegate::SessionReleased() SessionHolder.h:98 (chip-tool:x86_64+0x100b6a151)
2022-07-18T16:08:17.2823320Z 2022-07-18 16:08:17.280 INFO        #4 chip::Transport::Session::NotifySessionReleased() Session.h:120 (chip-tool:x86_64+0x1000fd6c9)
2022-07-18T16:08:17.2838820Z 2022-07-18 16:08:17.280 INFO        #5 chip::Transport::SecureSession::MarkForEviction() SecureSession.cpp:149 (chip-tool:x86_64+0x100abce9a)
2022-07-18T16:08:17.2840730Z 2022-07-18 16:08:17.280 INFO        #6 auto chip::SessionManager::Shutdown()::$_0::operator()<chip::Transport::SecureSession*>(chip::Transport::SecureSession*) const SessionManager.cpp:119 (chip-tool:x86_64+0x100accfb1)
2022-07-18T16:08:17.2842710Z 2022-07-18 16:08:17.280 INFO        #7 chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::$_0>::Call(void*, void*) Pool.h:126 (chip-tool:x86_64+0x100accf15)
2022-07-18T16:08:17.2843950Z 2022-07-18 16:08:17.280 INFO        #8 chip::internal::HeapObjectList::ForEachNode(void*, chip::Loop (*)(void*, void*)) Pool.cpp:127 (chip-tool:x86_64+0x1009e1cca)
2022-07-18T16:08:17.2846490Z 2022-07-18 16:08:17.280 INFO        #9 chip::Loop chip::HeapObjectPool<chip::Transport::SecureSession>::ForEachActiveObject<chip::SessionManager::Shutdown()::$_0>(chip::SessionManager::Shutdown()::$_0&&) Pool.h:396 (chip-tool:x86_64+0x100acce5d)
2022-07-18T16:08:17.2847970Z 2022-07-18 16:08:17.280 INFO        #10 chip::Loop chip::Transport::SecureSessionTable::ForEachSession<chip::SessionManager::Shutdown()::$_0>(chip::SessionManager::Shutdown()::$_0&&) SecureSessionTable.h:84 (chip-tool:x86_64+0x100ac34b9)
2022-07-18T16:08:17.2849040Z 2022-07-18 16:08:17.280 INFO        #11 chip::SessionManager::Shutdown() SessionManager.cpp:118 (chip-tool:x86_64+0x100ac2e1e)
2022-07-18T16:08:17.2850200Z 2022-07-18 16:08:17.280 INFO        #12 chip::Controller::DeviceControllerSystemState::Shutdown() CHIPDeviceControllerFactory.cpp:439 (chip-tool:x86_64+0x100b9438f)
2022-07-18T16:08:17.2851710Z 2022-07-18 16:08:17.280 INFO        #13 chip::Controller::DeviceControllerSystemState::Release() CHIPDeviceControllerSystemState.h:160 (chip-tool:x86_64+0x100b4f160)
2022-07-18T16:08:17.2853550Z 2022-07-18 16:08:17.281 INFO        #14 chip::Controller::DeviceController::Shutdown() CHIPDeviceController.cpp:315 (chip-tool:x86_64+0x100b4f072)
2022-07-18T16:08:17.2854270Z 2022-07-18 16:08:17.281 INFO    
2022-07-18T16:08:17.2855120Z 2022-07-18 16:08:17.281 INFO        #15 chip::Controller::DeviceCommissioner::Shutdown() CHIPDeviceController.cpp:497 (chip-tool:x86_64+0x100b51432)
2022-07-18T16:08:17.2856420Z 2022-07-18 16:08:17.281 INFO        #16 CHIPCommand::ShutdownCommissioner(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >) CHIPCommand.cpp:318 (chip-tool:x86_64+0x1008facbe)
2022-07-18T16:08:17.2857390Z 2022-07-18 16:08:17.281 INFO        #17 CHIPCommand::MaybeTearDownStack() CHIPCommand.cpp:170 (chip-tool:x86_64+0x1008fac37)
2022-07-18T16:08:17.2858450Z 2022-07-18 16:08:17.281 INFO        #18 CHIPCommand::Run() CHIPCommand.cpp:195 (chip-tool:x86_64+0x1008fae53)
2022-07-18T16:08:17.2859330Z 2022-07-18 16:08:17.281 INFO        #19 Commands::RunCommand(int, char**, bool) Commands.cpp:147 (chip-tool:x86_64+0x1009251bf)
2022-07-18T16:08:17.2860240Z 2022-07-18 16:08:17.281 INFO        #20 Commands::Run(int, char**) Commands.cpp:51 (chip-tool:x86_64+0x100924335)
2022-07-18T16:08:17.2860970Z 2022-07-18 16:08:17.281 INFO        #21 main main.cpp:48 (chip-tool:x86_64+0x1000e5a8d)
2022-07-18T16:08:17.2861650Z 2022-07-18 16:08:17.281 INFO        #22 start <null>:2 (libdyld.dylib:x86_64+0x15f3c)

Proposed Solution

Shutdown of the exchange manager should probably shut down outstanding exchanges. Or we should evict all sessions much earlier in shutdown, before we start tearing down object relationships.

Or both: since shutting down exchanges notifies applications, we should be doing that as early during shutdown as we can, so that there is minimal risk that applications touch something that is already torn down.

@bzbarsky-apple
Copy link
Contributor Author

Easy way to reproduce: set kSessionEstablishmentTimeout in CHIPDeviceController.cpp to 1, and then try to use chip-tool pairing code.

@mrjerryjohns
Copy link
Contributor

Shutting down exchanges has nuances - if the exchange is awaiting a response, then the exchange management layer presumably owns the ref, and should call Abort on it. If it has the will send flag on it, then we can only do a soft close by calling DoClose and notify the app. (logic here should be very similar to the logic in OnSessionReleased...)

@achsoftathome
Copy link
Contributor

achsoftathome commented Aug 17, 2022

Hi, I had the same problem while doing a thread-ble pairing (succeeded, OnPairingComplete was called) .

After debugging as shown in the stack below, a call to the "SystemLayer" function is made after a call to "GetSessionManager()" which returns a null pointer.

I am on 1a25bbd / SVE - Ubuntu 20.04 64 bits

Callstack:
chip::SessionManager::SystemLayer (this=0x0) <- Oups!
chip::Messaging::ExchangeContext::CancelResponseTimer (this=0x7fffd001ca00)
chip::Messaging::ExchangeContext::OnSessionReleased (this=0x7fffd001ca00)
chip::Messaging::ExchangeContext::OnSessionReleased (this=0x7fffd001ca00)
chip::Transport::Session::NotifySessionReleased (this=0x7fffd0001a30, this@entry=0x7fffd0001a30)
chip::Transport::SecureSession::MarkForEviction (this=0x7fffd0001a30)
operator()chip::Transport::SecureSession* (session=, __closure=)
chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::<lambda(auto:5)> >::Call(void *, void *) (context=, target=)
chip::internal::HeapObjectList::ForEachNode (this=0x555555632778, this@entry=0x555555632778, context=0x7fffffffdc77, context@entry=0x7fffffffdc77, lambda=0x7ffff7dfbfa7 <chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::<lambda(auto:5)> >::Call(void *, void *)>, lambda@entry=0x7ffff7dfbfa7 <chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::<lambda(auto:5)> >::Call(void *, void *)>)
chip::HeapObjectPoolchip::Transport::SecureSession::ForEachActiveObject<chip::SessionManager::Shutdown()::<lambda(auto:5)> > (this=0x555555632768)
chip::Transport::SecureSessionTable::ForEachSession<chip::SessionManager::Shutdown()::<lambda(auto:5)> > (this=0x555555632760)
chip::SessionManager::Shutdown (this=0x555555632700)
chip::Controller::DeviceControllerSystemState::Shutdown (this=0x5555556378d0)
chip::Controller::DeviceControllerSystemState::Shutdown (this=0x5555556378d0)
chip::Controller::DeviceControllerSystemState::Release (this=)
chip::Controller::DeviceController::Shutdown (this=0x55555565a530)
chip::Controller::DeviceController::Shutdown (this=0x55555565a530)
CHIPCommand::MaybeTearDownStack (this=0x7fffffffddf0)
CHIPCommand::MaybeTearDownStack (this=0x7fffffffddf0)
CHIPCommand::Run (this=0x7fffffffddf0)
CHIPCommand::Run (this=0x7fffffffddf0, this@entry=0x7fffffffddf0)

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Aug 29, 2022
… outstanding.

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes project-chip#16440

Should vastly improve, if not completely fix,
project-chip#20880
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Aug 30, 2022
… outstanding.

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes project-chip#16440

Should vastly improve, if not completely fix,
project-chip#20880
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Aug 30, 2022
… outstanding.

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes project-chip#16440

Should vastly improve, if not completely fix,
project-chip#20880
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Aug 30, 2022
… outstanding.

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes project-chip#16440

Should vastly improve, if not completely fix,
project-chip#20880
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Aug 31, 2022
… outstanding.

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes project-chip#16440

Should vastly improve, if not completely fix,
project-chip#20880
bzbarsky-apple added a commit that referenced this issue Aug 31, 2022
… outstanding. (#22282)

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes #16440

Should vastly improve, if not completely fix,
#20880
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Aug 31, 2022
… outstanding.

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes project-chip#16440

Should vastly improve, if not completely fix,
project-chip#20880
@bzbarsky-apple
Copy link
Contributor Author

Or we should evict all sessions much earlier in shutdown

On the client, if you use DeviceController we now do that during DeviceController shutdown.... But on the server, or if not using DeviceController, you're on your own.

isiu-apple pushed a commit to isiu-apple/connectedhomeip that referenced this issue Sep 16, 2022
… outstanding. (project-chip#22282)

Commissioner shutdown shuts down the CASE sessions associated with the
commissioner, but not the PASE sessions.  Those sessions then get shut down
much later in shutdown, at which point various objects that are needed to
handle the shutdown are no longer present.

The fix is to shut down PASE sessions when we destroy CommissioneeDeviceProxy
objects, and ensure that we always destroy CommissioneeDeviceProxy via
ReleaseCommissioneeDevice, so we don't end up with dangling pointers to the
objects.

Fixes project-chip#16440

Should vastly improve, if not completely fix,
project-chip#20880
@franck-apple franck-apple added the p1 priority 1 work label Oct 24, 2022
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 20, 2024
We could end up shutting down the exchange manager while we still had live
sessions/exchanges, at which point shutting down those would crash as they tried
to get information from the exchange manager.

The fix is to document the required shutdown ordering and enforce it correctly
in CHIPDeviceControllerFactory and Server.

Fixes project-chip#20880
@mergify mergify bot closed this as completed in #34035 Jun 21, 2024
@mergify mergify bot closed this as completed in 674dca8 Jun 21, 2024
pimpalemahesh pushed a commit to pimpalemahesh/connectedhomeip that referenced this issue Jun 24, 2024
We could end up shutting down the exchange manager while we still had live
sessions/exchanges, at which point shutting down those would crash as they tried
to get information from the exchange manager.

The fix is to document the required shutdown ordering and enforce it correctly
in CHIPDeviceControllerFactory and Server.

Fixes project-chip#20880
austina-csa pushed a commit to austina-csa/connectedhomeip that referenced this issue Jul 10, 2024
We could end up shutting down the exchange manager while we still had live
sessions/exchanges, at which point shutting down those would crash as they tried
to get information from the exchange manager.

The fix is to document the required shutdown ordering and enforce it correctly
in CHIPDeviceControllerFactory and Server.

Fixes project-chip#20880
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 priority 1 work security V1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants