Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not reuse cache change if sample does not fit. [7609] #1013

Merged
merged 2 commits into from
Feb 15, 2020

Conversation

hidmic
Copy link

@hidmic hidmic commented Feb 13, 2020

Precisely what the title says. I have the impression this was the intent of the original author, but I may be wrong. Without this patch, I can reliably get SerializedPayload_t instances where length > max_size, though in most cases by a small margin (which would explain why this could've been missed). However, larger messages would often result in segfaults.

Note: I'm targeting 1.9.x because that's the Fast-RTPS version ROS 2 uses. It could be easily ported forward.

@dirk-thomas
Copy link
Contributor

This seems to be a regression of #819.

Reproducible case (not on every run but certainly a few times within 10 tries):

  • play this bag file: ros2 bag play test_pointcloud_sigabrt_0.db3
  • echo the point cloud: ros2 topic echo /velodyne_points3

I added this output in Fast-RTPS/include/fastrtps/rtps/common/CacheChange.h before line 273:

serializedPayload.length 709641 is larger than its max size 663593

Valgrind shows memory problems:

==1970191== Thread 4:
==1970191== Invalid write of size 2
==1970191==    at 0x4845B33: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1970191==    by 0xB6A17B7: eprosima::fastrtps::rtps::CacheChange_t::add_fragments(eprosima::fastrtps::rtps::SerializedPayload_t const&, unsigned int, unsigned int) (CacheChange.h:277)
==1970191==    by 0xB6A5B79: eprosima::fastrtps::rtps::StatelessReader::processDataFragMsg(eprosima::fastrtps::rtps::CacheChange_t*, unsigned int, unsigned int, unsigned short) (StatelessReader.cpp:411)
==1970191==    by 0xB6C3400: eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*)::{lambda(eprosima::fastrtps::rtps::RTPSReader*)#1}::operator()(eprosima::fastrtps::rtps::RTPSReader*) const (MessageReceiver.cpp:797)
==1970191==    by 0xB6C50C2: void eprosima::fastrtps::rtps::MessageReceiver::findAllReaders<eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*)::{lambda(eprosima::fastrtps::rtps::RTPSReader*)#1}>(eprosima::fastrtps::rtps::EntityId_t const&, eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*)::{lambda(eprosima::fastrtps::rtps::RTPSReader*)#1} const&) (MessageReceiver.cpp:472)
==1970191==    by 0xB6C3CCC: eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*) (MessageReceiver.cpp:794)
==1970191==    by 0xB6C14A7: eprosima::fastrtps::rtps::MessageReceiver::processCDRMsg(eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::CDRMessage_t*) (MessageReceiver.cpp:245)
==1970191==    by 0xB6D2D9A: eprosima::fastrtps::rtps::ReceiverResource::OnDataReceived(unsigned char const*, unsigned int, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&) (ReceiverResource.cpp:99)
==1970191==    by 0xB720788: eprosima::fastrtps::rtps::UDPChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) (UDPChannelResource.cpp:62)
==1970191==    by 0xB724B2C: void std::__invoke_impl<void, void (eprosima::fastrtps::rtps::UDPChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastrtps::rtps::UDPChannelResource*, eprosima::fastrtps::rtps::Locator_t>(std::__invoke_memfun_deref, void (eprosima::fastrtps::rtps::UDPChannelResource::*&&)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastrtps::rtps::UDPChannelResource*&&, eprosima::fastrtps::rtps::Locator_t&&) (invoke.h:73)
==1970191==    by 0xB7249D8: std::__invoke_result<void (eprosima::fastrtps::rtps::UDPChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastrtps::rtps::UDPChannelResource*, eprosima::fastrtps::rtps::Locator_t>::type std::__invoke<void (eprosima::fastrtps::rtps::UDPChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastrtps::rtps::UDPChannelResource*, eprosima::fastrtps::rtps::Locator_t>(void (eprosima::fastrtps::rtps::UDPChannelResource::*&&)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastrtps::rtps::UDPChannelResource*&&, eprosima::fastrtps::rtps::Locator_t&&) (invoke.h:95)
==1970191==    by 0xB7248E8: void std::thread::_Invoker<std::tuple<void (eprosima::fastrtps::rtps::UDPChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastrtps::rtps::UDPChannelResource*, eprosima::fastrtps::rtps::Locator_t> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (thread:244)
==1970191==  Address 0xe476068 is 663,592 bytes inside a block of size 663,593 alloc'd
==1970191==    at 0x4840FAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1970191==    by 0xB64C3BF: eprosima::fastrtps::rtps::SerializedPayload_t::reserve(unsigned int) (SerializedPayload.h:161)
==1970191==    by 0xB688DE9: eprosima::fastrtps::rtps::CacheChangePool::reserve_Cache(eprosima::fastrtps::rtps::CacheChange_t**, unsigned int) (CacheChangePool.cpp:149)
==1970191==    by 0xB6AC716: eprosima::fastrtps::rtps::History::reserve_Cache(eprosima::fastrtps::rtps::CacheChange_t**, unsigned int) (History.h:77)
==1970191==    by 0xB6AA284: eprosima::fastrtps::rtps::RTPSReader::reserveCache(eprosima::fastrtps::rtps::CacheChange_t**, unsigned int) (RTPSReader.cpp:73)
==1970191==    by 0xB6A5AAA: eprosima::fastrtps::rtps::StatelessReader::processDataFragMsg(eprosima::fastrtps::rtps::CacheChange_t*, unsigned int, unsigned int, unsigned short) (StatelessReader.cpp:391)
==1970191==    by 0xB6C3400: eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*)::{lambda(eprosima::fastrtps::rtps::RTPSReader*)#1}::operator()(eprosima::fastrtps::rtps::RTPSReader*) const (MessageReceiver.cpp:797)
==1970191==    by 0xB6C50C2: void eprosima::fastrtps::rtps::MessageReceiver::findAllReaders<eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*)::{lambda(eprosima::fastrtps::rtps::RTPSReader*)#1}>(eprosima::fastrtps::rtps::EntityId_t const&, eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*)::{lambda(eprosima::fastrtps::rtps::RTPSReader*)#1} const&) (MessageReceiver.cpp:472)
==1970191==    by 0xB6C3CCC: eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_DataFrag(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*) (MessageReceiver.cpp:794)
==1970191==    by 0xB6C14A7: eprosima::fastrtps::rtps::MessageReceiver::processCDRMsg(eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::CDRMessage_t*) (MessageReceiver.cpp:245)
==1970191==    by 0xB6D2D9A: eprosima::fastrtps::rtps::ReceiverResource::OnDataReceived(unsigned char const*, unsigned int, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&) (ReceiverResource.cpp:99)
==1970191==    by 0xB720788: eprosima::fastrtps::rtps::UDPChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) (UDPChannelResource.cpp:62)
==1970191==

valgrind: m_mallocfree.c:305 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 663664, hi = 4577179683108720591.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.

I can confirm that the patch fixes the segfault / memory corruption for me.

I suggest to simplify the patch to just this one line change (keeping the order of the if / else blocks:

-                        if (work_change->serializedPayload.max_size <= sampleSize)
+                        if (sampleSize <= work_change->serializedPayload.max_size)

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic hidmic force-pushed the hidmic/stateless-reader-fix branch from ebc0602 to e18db02 Compare February 13, 2020 22:33
@hidmic
Copy link
Author

hidmic commented Feb 13, 2020

Patch simplified as per @dirk-thomas suggestion. Thanks for all the additional evidence!

@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@MiguelCompany
Copy link
Member

MiguelCompany commented Feb 14, 2020

@hidmic @dirk-thomas Thanks for the catch and the patch. The change is perfect!

But, before merging, it would be better to have a regression test in place. We will do it first thing today

@MiguelCompany MiguelCompany changed the title Do not reuse cache change if sample does not fit. Do not reuse cache change if sample does not fit. [7609] Feb 14, 2020
@MiguelCompany MiguelCompany self-assigned this Feb 14, 2020
@MiguelCompany MiguelCompany added the bug Issue to report a bug label Feb 14, 2020
@adolfomarver
Copy link
Contributor

Test added!

@richiware
Copy link
Member

richiware commented Feb 14, 2020

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status (unrelated failures)

Copy link
Contributor

@adolfomarver adolfomarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MiguelCompany MiguelCompany merged commit 59de387 into eProsima:1.9.x Feb 15, 2020
IkerLuengo added a commit that referenced this pull request Feb 19, 2020
This is a port of PR #1013 to v1.9.3
Cherry-pick of commit 59de387

Slight modification on the new regression test to make it
work as asynchronous publisher, because
v1.9.3 did not allow large messages on synchronous mode
IkerLuengo added a commit that referenced this pull request Feb 21, 2020
MiguelCompany pushed a commit that referenced this pull request Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue to report a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants