Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bionic Fast-RTPS SEGFAULT on Parameters #464

Closed
mjcarroll opened this issue Apr 24, 2018 · 12 comments
Closed

Bionic Fast-RTPS SEGFAULT on Parameters #464

mjcarroll opened this issue Apr 24, 2018 · 12 comments
Assignees
Labels
bug Something isn't working hitlist

Comments

@mjcarroll
Copy link
Member

mjcarroll commented Apr 24, 2018

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04 Bionic
  • Installation type:
    • Source
  • Version or commit hash:
    • master
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

In the migration to Bionic, we uncovered a few failing tests, all involving Fast-RTPS, which involves segfaults in the use of Parameters. ros2/ros2#481

To reproduce:
ros2 run demo_nodes_cpp set_and_get_parameters

Backtrace:

#0  0x00007ffff72d8ca1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff72d910d in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff6545c78 in eprosima::fastcdr::Cdr::deserialize (this=0x7fffffff8080, string_t="") at /home/michael/workspaces/ros2_ws/install/fastcdr/include/fastcdr/Cdr.h:1594
#3  0x00007ffff654597f in eprosima::fastcdr::Cdr::operator>> (this=0x7fffffff8080, string_t="") at /home/michael/workspaces/ros2_ws/install/fastcdr/include/fastcdr/Cdr.h:469
#4  0x00007ffff654a5b7 in rmw_fastrtps_cpp::deserialize_field<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (member=0x7ffff5b21680 <rcl_interfaces::msg::rosidl_typesupport_introspection_cpp::ParameterValue_message_member_array+384>, field=0x5555567
9c678, deser=..., call_new=true) at /home/michael/workspaces/ros2_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:411
#5  0x00007ffff654826d in rmw_fastrtps_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage (this=0x555555dade00, deser=..., members=0x7ffff5b21740 <rcl_interfaces::msg::rosidl_typesupport_introspection_cpp::ParameterValue_message_members>, ro
s_message=0x55555679c660, call_new=true) at /home/michael/workspaces/ros2_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:576
#6  0x00007ffff65482b3 in rmw_fastrtps_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage (this=0x555555dade00, deser=..., members=0x7ffff5b21480 <rcl_interfaces::msg::rosidl_typesupport_introspection_cpp::Parameter_message_members>, ros_mes
sage=0x55555679c640, call_new=true) at /home/michael/workspaces/ros2_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:582
#7  0x00007ffff6548391 in rmw_fastrtps_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage (this=0x555555dade00, deser=..., members=0x7ffff5b21d40 <rcl_interfaces::srv::rosidl_typesupport_introspection_cpp::SetParameters_Request_message_membe
rs>, ros_message=0x55555679c2c0, call_new=false) at /home/michael/workspaces/ros2_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:601
#8  0x00007ffff65470f8 in rmw_fastrtps_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage (this=0x555555dade00, buffer=0x7fffdc001a40, ros_message=0x55555679c2c0) at /home/michael/workspaces/ros2_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/inc
lude/rmw_fastrtps_cpp/TypeSupport_impl.hpp:723
#9  0x00007ffff6545078 in _deserialize_ros_message (buffer=0x7fffdc001a40, ros_message=0x55555679c2c0, untyped_typesupport=0x555555dade00, typesupport_identifier=0x7ffff4cca5c8 "rosidl_typesupport_introspection_cpp") at /home/michael/workspaces/ros2_ws/src/ros2/rmw_fastrtps/rmw_fastr
tps_cpp/src/ros_message_serialization.cpp:46
#10 0x00007ffff653876f in rmw_take_request (service=0x55555619b650, request_header=0x55555638acc0, ros_request=0x55555679c2c0, taken=0x7fffffff81e0) at /home/michael/workspaces/ros2_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_request.cpp:101
#11 0x00007ffff699405f in rcl_take_request (service=0x5555560cbbb0, request_header=0x55555638acc0, ros_request=0x55555679c2c0) at /home/michael/workspaces/ros2_ws/src/ros2/rcl/rcl/src/rcl/service.c:287
#12 0x00007ffff7a561c5 in rclcpp::executor::Executor::execute_service (service=std::shared_ptr<rclcpp::ServiceBase> (use count 3, weak count 1) = {...}) at /home/michael/workspaces/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:337
#13 0x00007ffff7a55ae1 in rclcpp::executor::Executor::execute_any_executable (this=0x55555638b280, any_exec=std::shared_ptr<rclcpp::executor::AnyExecutable> (use count 2, weak count 0) = {...}) at /home/michael/workspaces/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:262
#14 0x00007ffff7a55776 in rclcpp::executor::Executor::spin_once (this=0x55555638b280, timeout=...) at /home/michael/workspaces/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:224
#15 0x00007ffff7abafb9 in rclcpp::executor::Executor::spin_until_future_complete<std::vector<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, std::allocator<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > > >, std::ratio<1l, 1000l> > (this=0x55555638
b280, future=..., timeout=...) at /home/michael/workspaces/ros2_ws/src/ros2/rclcpp/rclcpp/include/rclcpp/executor.hpp:238
#16 0x00007ffff7ab5a49 in rclcpp::executors::spin_node_until_future_complete<std::vector<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, std::allocator<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > > >, std::ratio<1l, 1000l> > (executor=..., node_
ptr=std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface> (use count 5, weak count 1) = {...}, future=..., timeout=...) at /home/michael/workspaces/ros2_ws/src/ros2/rclcpp/rclcpp/include/rclcpp/executors.hpp:79
#17 0x00007ffff7aad824 in rclcpp::SyncParametersClient::set_parameters (this=0x5555562c1b80, parameters=std::vector of length 4, capacity 4 = {...}) at /home/michael/workspaces/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/parameter_client.cpp:394
#18 0x000055555555e954 in main (argc=1, argv=0x7fffffff8b18) at /home/michael/workspaces/ros2_ws/src/ros2/demos/demo_nodes_cpp/src/parameters/set_and_get_parameters.cpp:49
@mjcarroll mjcarroll added bug Something isn't working in progress Actively being worked on (Kanban column) hitlist labels Apr 24, 2018
@mjcarroll mjcarroll self-assigned this Apr 24, 2018
@mjcarroll
Copy link
Member Author

After some inspection, I think that the issue arises from the fact that by default, the rcl_interfaces::msg::SetParametersResult field reason is an empty string. If I populate the reason field with an arbitrary (non-zero-length) string, then the program no longer crashes.

I also see that there was a modification to handle a bug in zero length arrays upstream, I'm going to see if it's related (eProsima/Fast-CDR@581bc96)

@mikaelarguedas
Copy link
Member

Thanks @mjcarroll for picking this up.

To give slightly more context on this based on our investigations from last week:

  • the issue seems to be on our side in the typesupport introspection.
  • we seem to be giving invalid memory to FastRTPS to stuff the messages, it can be that we don't allocate the structures properly or that we have a bug in our offset calculations in the typesupport structures.
  • I don't think this is related to rclcpp or Fast-CDR and this issue likely belongs on the rmw_fastrtps repo (if the issue lives in the typesupport implementation) or rosidl_typesupport (if the issue comes from the typesupport structures)

@mjcarroll
Copy link
Member Author

Upstream fix was not related.

@mjcarroll
Copy link
Member Author

I built a small test program to isolate the serialization/deserialization mechanisms:
https://gist.github.com/mjcarroll/5ef1437a2a4ab95e8d14e13c53c9fcfd

The problem is reproducible, with a segfault occurring in the deserialization phase.

@mjcarroll
Copy link
Member Author

ASAN Results:

➜  test_rclcpp ./test_typesupport 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==22880==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7ffb48981ca1 bp 0x000000000000 sp 0x7ffdf61e3ca0 T0)
==22880==The signal is caused by a WRITE memory access.
==22880==Hint: address points to the zero page.
    #0 0x7ffb48981ca0 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x128ca0)
    #1 0x7ffb4898210c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x12910c)
    #2 0x54eb4b in eprosima::fastcdr::Cdr::deserialize(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /home/michael/workspaces/ros2_ws2/install/include/fastcdr/Cdr.h:1594:34
    #3 0x54e81c in eprosima::fastcdr::Cdr::operator>>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /home/michael/workspaces/ros2_ws2/install/include/fastcdr/Cdr.h:469:70
    #4 0x533f54 in void rmw_fastrtps_cpp::deserialize_field<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(rosidl_typesupport_introspection_cpp::MessageMember const*, void*, eprosima::fastcdr::Cdr&, bool) /home/michael/workspaces/ros2_ws2/install/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:412:11
    #5 0x532672 in rmw_fastrtps_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage(eprosima::fastcdr::Cdr&, rosidl_typesupport_introspection_cpp::MessageMembers const*, void*, bool) /home/michael/workspaces/ros2_ws2/install/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:578:9
    #6 0x532a1b in rmw_fastrtps_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage(eprosima::fastcdr::Cdr&, rosidl_typesupport_introspection_cpp::MessageMembers const*, void*, bool) /home/michael/workspaces/ros2_ws2/install/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:603:15
    #7 0x525cec in rmw_fastrtps_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage(eprosima::fastcdr::FastBuffer*, void*) /home/michael/workspaces/ros2_ws2/install/include/rmw_fastrtps_cpp/TypeSupport_impl.hpp:725:18
    #8 0x523ee9 in main /home/michael/workspaces/ros2_ws2/src/ros2/system_tests/test_rclcpp/test/test_typesupport.cpp:61:12
    #9 0x7ffb47ac7b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
    #10 0x427109 in _start (/home/michael/workspaces/ros2_ws2/build/test_rclcpp/test_typesupport+0x427109)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x128ca0) in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
==22880==ABORTING

@Karsten1987
Copy link
Contributor

Karsten1987 commented Apr 25, 2018

Does it work with the topic typesupports? I am currently running similar tests with topic ts and it seems fine. However, I am running this test on my mac. Does the same code run successfully on 16.04?

@mikaelarguedas
Copy link
Member

this seem to happen only on parameters so far. All the topic tests from test_communication pass (once identified we should add a regression test case in test communication to cover a similar message type though).
It seems related to bionic (and it's gcc / stl versions) as we cannot reproduce it on xenial.

Last bit of information @mjcarroll found is that if we resize the container before passing it to the deserialize functions it works but if we rely on the resize happening in the rmw_fastrtps typesupport implementation it fails regardless of how big we resize the container.

@mjcarroll do you have a repo with your custom test typesupport script?

@Karsten1987
Copy link
Contributor

just fyi:
https://github.com/ros2/system_tests/blob/expose_cdr/test_communication/test/test_message_serialization.cpp
These are the tests I am currently running for my PR, it seems to work fine so far on osx.

However, there is this PR open for FastCDR which resizes the buffer before taking the payload from the wire. Don't think it's really related to this question, but I link it here for completeness.
eProsima/Fast-CDR#16

@mjcarroll
Copy link
Member Author

@mikaelarguedas it's in the gist above, I hadn't landed it anywhere more permanent before, but seems reasonable to make it a regression test once we figure it out.

@mikaelarguedas
Copy link
Member

Duh 😖, thanks

@mjcarroll
Copy link
Member Author

@Karsten1987 If you get a chance, you should include a message that includes an array of sub-messages that also includes a string, which is what the parameters response message is. That would execute the same code paths I'm currently debugging.

@mjcarroll
Copy link
Member Author

Just kidding, I read your code and you are already doing it :)

I will test yours on Bionic and see what happens.

@mjcarroll mjcarroll added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels May 1, 2018
@mjcarroll mjcarroll removed the in review Waiting for review (Kanban column) label May 3, 2018
nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
DensoADAS pushed a commit to DensoADAS/rclcpp that referenced this issue Aug 5, 2022
…os2#530)

* Fixes ros2#464 - caches by message size and not message count

Signed-off-by: Jaison Titus <jaisontj@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hitlist
Projects
None yet
Development

No branches or pull requests

3 participants