Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write after free of RetransTableEntry #14446

Closed
kpschoedel opened this issue Jan 27, 2022 · 5 comments · Fixed by #20058
Closed

Write after free of RetransTableEntry #14446

kpschoedel opened this issue Jan 27, 2022 · 5 comments · Fixed by #20058

Comments

@kpschoedel
Copy link
Contributor

kpschoedel commented Jan 27, 2022

TestReadInteraction::TestReadChunking() exhibits a use-after-free of a RetransTableEntry.

Original description (not that it is not necessary to use a heap pool for the use-after-free to occur, but it's harder to trace):


Problem

If ReliableMessageMgr::mRetransTable is a heap-allocated pool, TestReadInteraction aborts:

CHIP:SPT: VerifyOrDie failure at ../../src/lib/core/Optional.h:170: HasValue()

This is surprising and may indicate an underlying memory safety problem.

Proposed Solution

Investigate and resolve.

kpschoedel added a commit to kpschoedel/connectedhomeip that referenced this issue Jan 27, 2022
#### Problem

`ObjectPool` can use either in-line or heap allocation with a
platform-configurable default, but several cases still use the
original inline-only `BitMapObjectPool`.

#### Change overview

- Declare pools as `ObjectPool` rather than `BitMapObjectPool`
  where possible. (Issues for exceptions: project-chip#14444, project-chip#14446)

- Remove some empty-on-shutdown checks since the pool normally does this
  itself (inhibited for asan+heap testing):
    - ExchangeManager::Shutdown()
    - ~EndPointManagerImplPool()

- Fix many tests to support heap allocation.

#### Testing

CI; no change to visible functionality expected.
kpschoedel added a commit to kpschoedel/connectedhomeip that referenced this issue Jan 27, 2022
#### Problem

`ObjectPool` can use either in-line or heap allocation with a
platform-configurable default, but several cases still use the
original inline-only `BitMapObjectPool`.

#### Change overview

- Declare pools as `ObjectPool` rather than `BitMapObjectPool`
  where possible. (Issues for exceptions: project-chip#14444, project-chip#14446)

- Remove some empty-on-shutdown checks since the pool normally does this
  itself (inhibited for asan+heap testing):
    - ExchangeManager::Shutdown()
    - ~EndPointManagerImplPool()

- Fix many tests to support heap allocation.

#### Testing

CI; no change to visible functionality expected.
kpschoedel added a commit to kpschoedel/connectedhomeip that referenced this issue Feb 1, 2022
#### Problem

`ObjectPool` can use either in-line or heap allocation with a
platform-configurable default, but several cases still use the
original inline-only `BitMapObjectPool`.

#### Change overview

- Declare pools as `ObjectPool` rather than `BitMapObjectPool`
  where possible. (Issues for exceptions: project-chip#14444, project-chip#14446)

- Remove some empty-on-shutdown checks since the pool normally does this
  itself (inhibited for asan+heap testing):
    - ExchangeManager::Shutdown()
    - ~EndPointManagerImplPool()

- Fix many tests to support heap allocation.

#### Testing

CI; no change to visible functionality expected.
woody-apple pushed a commit to kpschoedel/connectedhomeip that referenced this issue Feb 2, 2022
#### Problem

`ObjectPool` can use either in-line or heap allocation with a
platform-configurable default, but several cases still use the
original inline-only `BitMapObjectPool`.

#### Change overview

- Declare pools as `ObjectPool` rather than `BitMapObjectPool`
  where possible. (Issues for exceptions: project-chip#14444, project-chip#14446)

- Remove some empty-on-shutdown checks since the pool normally does this
  itself (inhibited for asan+heap testing):
    - ExchangeManager::Shutdown()
    - ~EndPointManagerImplPool()

- Fix many tests to support heap allocation.

#### Testing

CI; no change to visible functionality expected.
bzbarsky-apple pushed a commit that referenced this issue Feb 2, 2022
* Replace BitMapObjectPool with configurable ObjectPool

#### Problem

`ObjectPool` can use either in-line or heap allocation with a
platform-configurable default, but several cases still use the
original inline-only `BitMapObjectPool`.

#### Change overview

- Declare pools as `ObjectPool` rather than `BitMapObjectPool`
  where possible. (Issues for exceptions: #14444, #14446)

- Remove some empty-on-shutdown checks since the pool normally does this
  itself (inhibited for asan+heap testing):
    - ExchangeManager::Shutdown()
    - ~EndPointManagerImplPool()

- Fix many tests to support heap allocation.

#### Testing

CI; no change to visible functionality expected.

* Leave some pools inline

- `ExchangeManager::mContextPool` and
  `ReliableMessageMgr::mContextPool`: issue #14509
- `GroupSessionTable::mEntries`: issue #14512

* Add chip::Test::PlatformMemoryUser

- Add PlatformMemoryUser to manage MemoryInit()/MemoryShutdown()
  for test contexts that may use heap.
- Revert related test changes

* restyle
@kpschoedel kpschoedel self-assigned this Feb 3, 2022
@kpschoedel kpschoedel changed the title TestReadInteraction aborts if ReliableMessageMgr::mRetransTable uses heap allocation Write after free of RetransTableEntry Feb 3, 2022
@kpschoedel
Copy link
Contributor Author

TestReadInteraction::TestReadChunking leads to a use-after-free of a RetransTableEntry, which has been masked by the use of a static BitMapObjectPool for ReliableMessageMgr::mRetransTable.

Complicated details to follow.

@kpschoedel
Copy link
Contributor Author

Enter TestReadInteraction::TestReadChunking(…)
At 817: InteractionModelEngine::GetInstance()->GetReportingEngine().Run();

Enter Engine::Run()
At 503: CHIP_ERROR err = BuildAndSendSingleReportData(readHandler);

Enter Engine::BuildAndSendSingleReportData(…)

CHIP:DMG: RE:Run Cluster fff10002, Attribute fff10004 is dirty
CHIP:DMG: Reading Mock Cluster fff10002, Field fff10004 is dirty
CHIP:DMG: Error retrieving data from clusterId: 0xFFF1_0002, err = ../../src/system/TLVPacketBufferBackingStore.cpp:88: Error 0x0000000B
CHIP:DMG: RE:Run We cannot put more chunks into this report. Enable chunking.
CHIP:DMG: Sending report (payload has 907 bytes)...

At 426: err = SendReport(apReadHandler, std::move(bufHandle), hasMoreChunks);

Enter Engine::SendReport(…)
At 638: err = apReadHandler->SendReportData(std::move(aPayload), aHasMoreChunks);

Enter ReadHandler::SendReportData(…)

CHIP:DMG: IM RH moving to [AwaitingReportResponse]

At 221: mpExchangeCtx->SendMessage(Protocols::InteractionModel::MsgType::ReportData, std::move(aPayload),
At 222: Messaging::SendFlags(noResponseExpected ? Messaging::SendMessageFlags::kNone
At 223: : Messaging::SendMessageFlags::kExpectResponse));

Enter SendMessage(…)
At 126: return SendMessage(Protocols::MessageTypeTraits<MessageType>::ProtocolId(), to_underlying(msgType), std::move(msgPayload), …)

Enter ExchangeContext::SendMessage(…)
At 179: CHIP_ERROR err = mDispatch.SendMessage(GetExchangeMgr()->GetSessionManager(), mSession.Get(), mExchangeId, IsInitiator(),
At 180: GetReliableMessageContext(), reliableTransmissionRequested, protocolId, msgType,
At 181: std::move(msgBuf));

Enter ExchangeMessageDispatch::SendMessage(…)

CHIP:EM: Piggybacking Ack for MessageCounter:12451190 on exchange: 22567r

Here the RetransTableEntry for MessageCounter:10983592 is created and retained in entryOwner:

At 80: ReturnErrorOnFailure(reliableMessageMgr->AddToRetransTable(reliableMessageContext, &entry));
At 84: std::unique_ptr<ReliableMessageMgr::RetransTableEntry, decltype(deleter)> entryOwner(entry, deleter);

CHIP:IN: Prepared secure message 0x7ffff7fb4b88 to 0x000000000001E306 (1) of type 0x5 and protocolId (0, 1) on exchange 22567r with MessageCounter:10983592.

At 87: CHIP_ERROR err = sessionManager->SendPreparedMessage(session, entryOwner->retainedBuf);

Enter SessionManager::SendPreparedMessage(…)

CHIP:IN: Sending encrypted msg 0x7ffff7fb4b88 with MessageCounter:10983592 to 0x000000000001E306 (1) at monotonic time: 1485439570 msec

At 289: return mTransportMgr->SendMessage(*destination, std::move(msgBuf));

Enter Transport::Tuple::SendMessage()

Enter LoopbackTransport::SendMessage()
At 112: HandleMessageReceived(address, std::move(receivedMessage));

Enter Transport::Base::HandleMessageReceived()

Enter TransportMgrBase::HandleMessageReceived(…)
At 71: mSessionManager->OnMessageReceived(peerAddress, std::move(msg));

Enter SessionManager::OnMessageReceived(…)
At 399: SecureUnicastMessageDispatch(packetHeader, peerAddress, std::move(msg));

Enter SessionManager::SecureUnicastMessageDispatch(…)
At 510: if (SecureMessageCodec::Decrypt(secureSession->GetCryptoContext(), payloadHeader, packetHeader, msg) != CHIP_NO_ERROR)

Here packetHeader.GetMessageCounter() = 10983592

At 557: mCB->OnMessageReceived(packetHeader, payloadHeader, session.Value(), peerAddress, isDuplicate, std::move(msg));

Enter ExchangeManager::OnMessageReceived(…)

CHIP:EM: Received message of type 0x5 with protocolId (0, 1) and MessageCounter:10983592 on exchange 22567i
CHIP:EM: Found matching exchange: 22567i, Delegate: 0x7fffffffab00

At 219: ec->HandleMessage(packetHeader.GetMessageCounter(), payloadHeader, source, msgFlags, std::move(msgBuf));

Enter ExchangeContext::HandleMessage(…)
At 455: mDispatch.OnMessageReceived(messageCounter, payloadHeader, peerAddress, msgFlags, GetReliableMessageContext()));

Enter ExchangeMessageDispatch::OnMessageReceived(…)
At 128: reliableMessageContext->HandleRcvdAck(payloadHeader.GetAckMessageCounter().Value());

Enter ReliableMessageContext::HandleRcvdAck(…)
At 86: if (!GetReliableMessageMgr()->CheckAndRemRetransTable(this, ackMessageCounter))

CHIP:EM: Rxd Ack; Removing MessageCounter:12451190 from Retrans Table on exchange 22567i
CHIP:EM: Removed CHIP MessageCounter:12451190 from RetransTable on exchange 22567i

Return to ExchangeMessageDispatch::OnMessageReceived(…)

Return to ExchangeContext::HandleMessage(…)
At 487: return mDelegate->OnMessageReceived(this, payloadHeader, std::move(msgBuf));

Enter RROR ReadClient::OnMessageReceived(…)
At 308: err = ProcessReportData(std::move(aPayload));

Enter ReadClient::ProcessReportData(…)

CHIP:DMG: ReportDataMessage =
CHIP:DMG: {
CHIP:DMG: AttributeReportIBs =
CHIP:DMG: [

CHIP:DMG: ],
CHIP:DMG:
CHIP:DMG: MoreChunkedMessages = true,
CHIP:DMG: InteractionModelRevision = 1
CHIP:DMG: }

At 504: err = StatusResponse::Send(err == CHIP_NO_ERROR ? Protocols::InteractionModel::Status::Success
At 505: : Protocols::InteractionModel::Status::InvalidSubscription,
At 506: mpExchangeCtx, !noResponseExpected);

Enter StatusResponse::Send(…)
At 39: ReturnErrorOnFailure(apExchangeContext->SendMessage(Protocols::InteractionModel::MsgType::StatusResponse, std::move(msgBuf),
At 40: aExpectResponse ? Messaging::SendMessageFlags::kExpectResponse
At 41: : Messaging::SendMessageFlags::kNone));

Enter ExchangeContext::SendMessage(…)
At 179: CHIP_ERROR err = mDispatch.SendMessage(GetExchangeMgr()->GetSessionManager(), mSession.Get(), mExchangeId, IsInitiator(),
At 180: GetReliableMessageContext(), reliableTransmissionRequested, protocolId, msgType,
At 181: std::move(msgBuf));

Enter ExchangeMessageDispatch::SendMessage(…)

CHIP:EM: Piggybacking Ack for MessageCounter:10983592 on exchange: 22567i
CHIP:IN: Prepared secure message 0x7ffff7fb4ae8 to 0x0000000006A11E3D (1) of type 0x1 and protocolId (0, 1) on exchange 22567i with MessageCounter:12451191.

At 87: CHIP_ERROR err = sessionManager->SendPreparedMessage(session, entryOwner->retainedBuf);

Enter SessionManager::SendPreparedMessage(…)

CHIP:IN: Sending encrypted msg 0x7ffff7fb4ae8 with MessageCounter:12451191 to 0x0000000006A11E3D (1) at monotonic time: 1489749050 msec

At 289: return mTransportMgr->SendMessage(*destination, std::move(msgBuf));

Enter Transport::Tuple::SendMessage()

Enter LoopbackTransport::SendMessage()
At 112: HandleMessageReceived(address, std::move(receivedMessage));

Enter Transport::Base::HandleMessageReceived()

Enter TransportMgrBase::HandleMessageReceived(…)
At 71: mSessionManager->OnMessageReceived(peerAddress, std::move(msg));

Enter SessionManager::OnMessageReceived(…)
At 399: SecureUnicastMessageDispatch(packetHeader, peerAddress, std::move(msg));

Enter SessionManager::SecureUnicastMessageDispatch(…)
At 557: mCB->OnMessageReceived(packetHeader, payloadHeader, session.Value(), peerAddress, isDuplicate, std::move(msg));

Enter ExchangeManager::OnMessageReceived(…)

CHIP:EM: Received message of type 0x1 with protocolId (0, 1) and MessageCounter:12451191 on exchange 22567r
CHIP:EM: Found matching exchange: 22567r, Delegate: 0x7ffff7fa2f00

At 219: ec->HandleMessage(packetHeader.GetMessageCounter(), payloadHeader, source, msgFlags, std::move(msgBuf));

Enter ExchangeContext::HandleMessage(…)
At 455: mDispatch.OnMessageReceived(messageCounter, payloadHeader, peerAddress, msgFlags, GetReliableMessageContext()));

Enter ExchangeMessageDispatch::OnMessageReceived(…)
At 128: reliableMessageContext->HandleRcvdAck(payloadHeader.GetAckMessageCounter().Value());

Enter ReliableMessageContext::HandleRcvdAck(uint32_t ackMessageCounter = __10983592__)

Here the RetransTableEntry for MessageCounter:10983592 is deleted:

At 86: if (!GetReliableMessageMgr()->CheckAndRemRetransTable(this, ackMessageCounter))

CHIP:EM: Rxd Ack; Removing MessageCounter:10983592 from Retrans Table on exchange 22567r
CHIP:EM: Removed CHIP MessageCounter:10983592 from RetransTable on exchange 22567r

Return to ExchangeMessageDispatch::OnMessageReceived(…)

Return to ExchangeContext::HandleMessage(…)

CHIP:DMG: StatusResponseMessage =
CHIP:DMG: {
CHIP:DMG: Status = 0x0,
CHIP:DMG: InteractionModelRevision = 1
CHIP:DMG: }
CHIP:IM: Received status response, status is 0
CHIP:DMG: OnReportConfirm: NumReports = 0

Return to ExchangeManager::OnMessageReceived()
Return to SessionManager::SecureUnicastMessageDispatch()
Return to SessionManager::OnMessageReceived()
Return to TransportMgrBase::HandleMessageReceived()
Return to LoopbackTransport::SendMessage()
Return to Transport::Tuple::SendMessage()
Return to SessionManager::SendPreparedMessage()
Return to ExchangeMessageDispatch::SendMessage()
Return to ExchangeContext::SendMessage()
Return to SendMessage()
Return to StatusResponse::Send()
Return to ReadClient::ProcessReportData()
Return to ReadClient::OnMessageReceived()
Return to ExchangeContext::HandleMessage()
Return to ExchangeManager::OnMessageReceived()
Return to SessionManager::SecureUnicastMessageDispatch()
Return to SessionManager::OnMessageReceived()
Return to TransportMgrBase::HandleMessageReceived()
Return to Transport::Base::HandleMessageReceived()
Return to LoopbackTransport::SendMessage()
Return to Transport::Tuple::SendMessage()
Return to SessionManager::SendPreparedMessage()
Return to ExchangeMessageDispatch::SendMessage()

Here we are back in ExchangeMessageDispatch::SendMessage(…) with entryOwner pointing to the deleted RetransTableEntry.

At 102: reliableMessageMgr->StartRetransmision(entryOwner.release());

Enter ReliableMessageMgr::StartRetransmision(…)

Here we have read and write of the deleted RetransTableEntry:

At 207: entry->nextRetransTime = System::SystemClock().GetMonotonicTimestamp() + entry->ec->GetSessionHandle()->GetMRPConfig().mIdleRetransTimeout;

@kpschoedel kpschoedel removed their assignment Feb 3, 2022
@bzbarsky-apple
Copy link
Contributor

So the fundamental problem, I expect, is that this test uses the sync-message-delivery loopback mode that I thought we had stopped using.... but it looks like a bunch of tests still use it:

  • TestWriteInteraction
  • TestCommandInteraction
  • TestEventLogging
  • TestReportingEngine
  • TestReadInteraction
  • TestInteractionModelEngine.cpp
  • TestBufferedReadCallback
  • TestAttributeCache

We should probably fix them all to not do that and see whether that addresses this issue.

@mrjerryjohns

@kpschoedel
Copy link
Contributor Author

Suspected it might be test-only when I hit LoopbackTransport.

The Interaction tests and TestReportingEngine hit this; the others pass ASAN.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 28, 2022
There were only two consumers (the retrans table and exchanges) left, which both
used to fail unit tests if using ObjectPool due to use-after-free.

One of those tests is now fixed; the other is fixed in this PR.  So we can move
to using ObjectPool everywhere.

Fixes project-chip#14509
Fixes project-chip#14446
@bzbarsky-apple
Copy link
Contributor

Looks like this is fixed now that all the unit tests use async messaging. I just tried making the retrans table into an ObjectPool and it passes for me. See #20058

@bzbarsky-apple bzbarsky-apple self-assigned this Jun 28, 2022
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 28, 2022
There were only two consumers (the retrans table and exchanges) left, which both
used to fail unit tests if using ObjectPool due to use-after-free.

One of those tests is now fixed; the other is fixed in this PR.  So we can move
to using ObjectPool everywhere.

Fixes project-chip#14509
Fixes project-chip#14446
bzbarsky-apple added a commit that referenced this issue Jun 28, 2022
There were only two consumers (the retrans table and exchanges) left, which both
used to fail unit tests if using ObjectPool due to use-after-free.

One of those tests is now fixed; the other is fixed in this PR.  So we can move
to using ObjectPool everywhere.

Fixes #14509
Fixes #14446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants