-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodic crash in services test #11
Comments
Looking at our Jenkins jobs, it seems that we're seeing intermittent crashes in multiple tests when using Fast-RTPS, e.g.:
We don't see such crashes with any other RMW implementation. It seems like either there are memory management issues internal to Fast-RTPS or we're using it incorrectly via the RMW layer. Any idea what's happening or where to look for a fix? |
We will analyze the problem. Thanks for your info |
I believe I found the problem. I'm testing a solution. |
That sounds promising. Let me know if we can help with testing. |
I've updated FastCDR and FastRTPS. In this update there are changes to resolve the segmentation fault. |
Thanks for looking into the problem. I updated Fast-CDR and Fast-RTPS and rebuilt everything. I'm still seeing occasional crashes from the
In this situation, I'm not getting a core file, so I can't provide any more detail (maybe nosetests and/or asyncio are getting in the way of the core file production?). |
Also, more often than the segfaults, I'm still seeing deadlocks in this test. I.e., both client and server are still running (haven't crashed), but they're not meeting the termination condition for the test. Any idea what's causing that problem? Here's a backtrace from attaching to the service server:
Here's a backtrace from attaching to the service client:
|
Turns out that I wasn't getting a core file because I had an old one already present in the same directory; rookie mistake. After rebuilding in
|
In the stack trace I posted, there's a missing step between frame 2, where |
After adding some debug prints, I can see that, when the segfault occurs, this line is about to execute (i.e., I see a print from immediately before), but it doesn't complete (i.e., I don't see a print from immediately after). Looks like somebody is trying to use the service object before it's finished being created. |
Hi I couldn't replicate the segmentation fault. But I know what is happening. I've updated FastRTPS and ROS-RMW-Fast-RTPS-cpp repositories. Also I fixed a deadlock. |
Thanks! The nightly build (which runs each test 20 times to check for sporadic failures) has been passing on Linux for several days now, which is very promising: http://ci.ros2.org/view/nightly/job/nightly_linux/. There are still some FastRTPS test failures in the nightly for OSX: http://ci.ros2.org/view/nightly/job/nightly_osx/182/testReport/, but that might not be the fault of FastRTPS. I'll look into it, and reopen this ticket or open a new one if needed. |
When built in Debug mode (pass
|
With the latest changes, we haven't seen any failures in our nightly CI in some time, which is great! When locally testing on Linux, I'm still able to get the test to hang sometimes, but I'm not seeing any crashes. I'll close this ticket and open a new one if and when I can provide more information about the hangs. |
Use data types when setting callbacks
Following #10, I'm trying to get
system_tests.test_rclcpp.test_services_cpp__rmw_fastrtps_cpp
to pass reliably, and I can't. I've seen a variety of segfaults, aborts for double-free, and deadlocks. I just spent some time digging through the code and have failed to figure out the problem. If you have everything built, then you can reproduce the problem like so (I've been testing on Linux):You should, after a short period of time, see a problem. If there's a crash, you should get a
core
file. I haven't yet gotten any crashes to happen in gdb or valgrind (most of the time, I just get a deadlock in that situation).I'm happy to provide more information to help with the investigation.
The text was updated successfully, but these errors were encountered: