Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure rviz_common::MessageFilterDisplay processes messages in the main thread #620

Merged
merged 3 commits into from
Nov 2, 2020

Conversation

hidmic
Copy link
Contributor

@hidmic hidmic commented Oct 29, 2020

Since tf2_ros::MessageFilter callback is called from tf2_ros::TransformListener dedicated thread, concurrency issues will arise if a derived display (like rviz_defaults_plugins::Polygon) attempts to modify the UI from rviz_common::MessageFilterDisplay::processMessage().

CI up to rviz_default_plugins and rviz_common:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Unfortunately, I do not have a regression test nor a reliable repro for the segfaults this causes. Below you'll find the partial backtrace that led me here.

Backtrace

#0 0x00007ffff214f649 in ?? () from /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0
#1 0x00007fffdc2d484e in Ogre::GLHardwareVertexBuffer::GLHardwareVertexBuffer(Ogre::HardwareBufferManagerBase*, unsigned long, unsigned long, Ogre::HardwareBuffer::Usage, bool) () from /opt/ros/foxy/opt/rviz_ogre_vendor/lib/OGRE/RenderSystem_GL.so.1.12.1
#2 0x00007fffdc2cb907 in Ogre::GLHardwareBufferManager::createVertexBuffer(unsigned long, unsigned long, Ogre::HardwareBuffer::Usage, bool) ()
from /opt/ros/foxy/opt/rviz_ogre_vendor/lib/OGRE/RenderSystem_GL.so.1.12.1
#3 0x00007ffff670f775 in Ogre::ManualObject::end() () from /opt/ros/foxy/opt/rviz_ogre_vendor/lib/libOgreMain.so.1.12.1
#4 0x00007fffc7e4ba3a in rviz_default_plugins::displays::PolygonDisplay::processMessage(std::shared_ptr<geometry_msgs::msg::PolygonStamped_<std::allocator > const>) () from /opt/ros/foxy/lib/librviz_default_plugins.so
#5 0x00007fffc7c7ffb5 in rviz_common::MessageFilterDisplay<geometry_msgs::msg::PolygonStamped_<std::allocator > >::incomingMessage(std::shared_ptr<geometry_msgs::msg::PolygonStamped_<std::allocator > const>) () from /opt/ros/foxy/lib/librviz_default_plugins.so
#6 0x00007fffc7c8022e in std::Function_handler<void (std::shared_ptr<geometry_msgs::msg::PolygonStamped<std::allocator > const> const&), std::Bind<void (rviz_common::MessageFilterDisplay<geometry_msgs::msg::PolygonStamped<std::allocator > >::(rviz_common::MessageFilterDisplay<geometry_msgs::msg::PolygonStamped_<std::allocator > >, std::Placeholder<1>))(std::shared_ptr<geometry_msgs::msg::PolygonStamped<std::allocator > const>)> >::M_invoke(std::Any_data const&, std::shared_ptr<geometry_msgs::msg::PolygonStamped<std::allocator > const> const&) ()
from /opt/ros/foxy/lib/librviz_default_plugins.so
#7 0x00007fffc7c904a4 in message_filters::CallbackHelper1T<std::shared_ptr<geometry_msgs::msg::PolygonStamped
<std::allocator > const> const&, geometry_msgs::msg::PolygonStamped_<std::allocator > >::call(message_filters::MessageEvent<geometry_msgs::msg::PolygonStamped_<std::allocator > const> const&, bool) () from /opt/ros/foxy/lib/librviz_default_plugins.so
#8 0x00007fffc7c8b6cb in tf2_ros::MessageFilter<geometry_msgs::msg::PolygonStamped_<std::allocator >, rviz_common::transformation::FrameTransformer>::transformReadyCallback(std::shared_future<geometry_msgs::msg::TransformStamped_<std::allocator > > const&, unsigned long) ()
from /opt/ros/foxy/lib/librviz_default_plugins.so
#9 0x00007fffc7660430 in ?? () from /opt/ros/foxy/lib/libtf2_ros.so
#10 0x00007fffdc0224b1 in tf2::BufferCore::testTransformableRequests() () from /opt/ros/foxy/lib/libtf2.so
#11 0x00007fffdc023d34 in tf2::BufferCore::setTransformImpl(tf2::Transform const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) ()
from /opt/ros/foxy/lib/libtf2.so
#12 0x00007fffdc02576d in tf2::BufferCore::setTransform(geometry_msgs::msg::TransformStamped
<std::allocator > const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) () from /opt/ros/foxy/lib/libtf2.so
#13 0x00007fffc7666826 in tf2_ros::TransformListener::subscription_callback(std::shared_ptr<tf2_msgs::msg::TFMessage
<std::allocator > >, bool) ()
from /opt/ros/foxy/lib/libtf2_ros.so
#14 0x00007fffc7ea643d in std::Function_handler<void (std::shared_ptr<tf2_msgs::msg::TFMessage<std::allocator > >), std::Bind<void (tf2_ros::TransformListener::(tf2_ros::TransformListener, std::Placeholder<1>, bool))(std::shared_ptr<tf2_msgs::msg::TFMessage<std::allocator > >, bool)> >::M_invoke(std::Any_data const&, std::shared_ptr<tf2_msgs::msg::TFMessage<std::allocator > >&&) ()
from /opt/ros/foxy/lib/librviz_default_plugins.so
--Type for more, q to quit, c to continue without paging--
#15 0x00007fffc7eabc81 in rclcpp::AnySubscriptionCallback<tf2_msgs::msg::TFMessage
<std::allocator >, std::allocator >::dispatch(std::shared_ptr<tf2_msgs::msg::TFMessage
<std::allocator > >, rclcpp::MessageInfo const&) () from /opt/ros/foxy/lib/librviz_default_plugins.so
#16 0x00007fffc7eac4ef in rclcpp::Subscription<tf2_msgs::msg::TFMessage<std::allocator >, std::allocator, rclcpp::message_memory_strategy::MessageMemoryStrategy<tf2_msgs::msg::TFMessage<std::allocator >, std::allocator > >::handle_message(std::shared_ptr&, rclcpp::MessageInfo const&) () from /opt/ros/foxy/lib/librviz_default_plugins.so
#17 0x00007ffff70e728c in ?? () from /opt/ros/foxy/lib/librclcpp.so
#18 0x00007ffff70e7b4b in rclcpp::Executor::execute_subscription(std::shared_ptrrclcpp::SubscriptionBase) () from /opt/ros/foxy/lib/librclcpp.so
#19 0x00007ffff70e830a in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /opt/ros/foxy/lib/librclcpp.so
#20 0x00007ffff70ecb0c in rclcpp::executors::SingleThreadedExecutor::spin() () from /opt/ros/foxy/lib/librclcpp.so
#21 0x00007fffc7666a72 in ?? () from /opt/ros/foxy/lib/libtf2_ros.so
#22 0x00007ffff6eedcb4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#23 0x00007ffff598c609 in start_thread (arg=) at pthread_create.c:477
#24 0x00007ffff6d2c103 in clone () from /usr/lib/x86_64-linux-gnu/libc.so.6

You can see on frame #4 that the crash occurs within rviz_default_plugins::Polygon implementation, which in frame #21 we see it's being called from tf2_ros dedicated thread implementation.

…in thread.

Since tf2_ros::MessageFilter callback is called from tf2_ros::TransformListener
dedicated thread, concurrency issues will arise if a derived display (like
rviz_defaults_plugins::Polygon) attempts to modify the UI from
rviz_common::MessageFilterDisplay::processMessage().

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic hidmic added the bug Something isn't working label Oct 29, 2020
@hidmic hidmic requested review from cottsay and wjwwood October 29, 2020 19:22
@hidmic
Copy link
Contributor Author

hidmic commented Oct 29, 2020

FYI @clalancette @IanTheEngineer

@hidmic
Copy link
Contributor Author

hidmic commented Oct 29, 2020

On a related note, ros-visualization/rviz#1456 appears to have a similar origin.

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Contributor Author

hidmic commented Oct 29, 2020

Argh the linter...

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look reasonable to me; I know that Qt gets very upset if you process UI changes off the main thread, and that looks to be the case here. It would be nice to get an additional review from @mjeronimo or @jacobperron just for sanity.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 30, 2020

Oh boy, it causes problems on Windows.

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@clalancette
Copy link
Contributor

Oh boy, it causes problems on Windows.

FYI: there is some trouble on running rviz CI against Windows. I often see that when I run CI, though we do not see it in the nightlies. I never did track down what the problem is, but it's not totally clear to me that it is caused by this PR. I don't know how to tell for sure, though.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 30, 2020

Good to know. I'm adding d4088ca to be on the safe side. Let's try again:

  • Windows Build Status (using this branch)
  • Windows Build Status (using ros2 branch)

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @jacobperron and/or @mjeronimo may want to have a look.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 30, 2020

Alright, yeah, same Windows CI issues on the ros2 branch.

@jacobperron
Copy link
Member

Alright, yeah, same Windows CI issues on the ros2 branch.

Something must have changed recently, Windows CI used to be green 14 days ago (for example) 🙁

@clalancette
Copy link
Contributor

Something must have changed recently, Windows CI used to be green 14 days ago (for example) slightly_frowning_face

Every PR that I've run CI on in this repository for the last several months has had the same set of failing tests on Windows. I can only guess that it has to do with the options chosen to build with; maybe whatever set was used on that other PR happened to avoid the problem. It needs investigation.

@jacobperron
Copy link
Member

jacobperron commented Oct 31, 2020

Interesting. The only difference that I spot is that when I run CI for RViz, I'm building and testing a super-set of packages. I'll continue looking into it.

Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's too bad the fix isn't ABI compatible for the purposes of backporting.

@hidmic
Copy link
Contributor Author

hidmic commented Nov 2, 2020

It's too bad the fix isn't ABI compatible for the purposes of backporting.

I can't think of an ABI compatible fix. We could work around it by using a queue in each display plugin implementation. It wouldn't be extensible to plugins outside our control though.

@hidmic
Copy link
Contributor Author

hidmic commented Nov 2, 2020

Thanks for the reviews guys! Merging.

@hidmic hidmic merged commit 323378a into ros2 Nov 2, 2020
@hidmic hidmic deleted the hidmic/fix-message-filter-display branch November 2, 2020 14:32
Kettenhoax pushed a commit to Kettenhoax/rviz that referenced this pull request May 5, 2021
…in thread (ros2#620)

Since tf2_ros::MessageFilter callback is called from tf2_ros::TransformListener
dedicated thread, concurrency issues will arise if a derived display (like
rviz_defaults_plugins::Polygon) attempts to modify the UI from
rviz_common::MessageFilterDisplay::processMessage().

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@spurnvoj
Copy link

spurnvoj commented Sep 8, 2021

Hi, are you planning to do backport of this PR to foxy?

@ahcorde
Copy link
Contributor

ahcorde commented Sep 13, 2021

@spurnvoj, if the backport commit is not open feel free to open it, I will be happy to review it.

spurnvoj pushed a commit to spurnvoj/rviz that referenced this pull request Sep 13, 2021
…in thread (ros2#620)

Since tf2_ros::MessageFilter callback is called from tf2_ros::TransformListener
dedicated thread, concurrency issues will arise if a derived display (like
rviz_defaults_plugins::Polygon) attempts to modify the UI from
rviz_common::MessageFilterDisplay::processMessage().

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@spurnvoj
Copy link

@ahcorde, see #765

ahcorde pushed a commit that referenced this pull request Oct 5, 2021
…ssages in the main thread (#620) (#765)

* Ensure rviz_common::MessageFilterDisplay processes messages in the main thread (#620)

Since tf2_ros::MessageFilter callback is called from tf2_ros::TransformListener
dedicated thread, concurrency issues will arise if a derived display (like
rviz_defaults_plugins::Polygon) attempts to modify the UI from
rviz_common::MessageFilterDisplay::processMessage().

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>

* Attempt to make this ABI compliant

Signed-off-by: Greg Balke <greg@openrobotics.org>

Co-authored-by: Michel Hidalgo <michel@ekumenlabs.com>
Co-authored-by: Greg Balke <greg@openrobotics.org>
gbalke added a commit that referenced this pull request Oct 8, 2021
…esses messages in the main thread (#620) (#765)"

This reverts commit 154feac.
gbalke added a commit that referenced this pull request Oct 11, 2021
…esses messages in the main thread (#620) (#765)" (#783)

This reverts commit 154feac.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants