-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use wait_for_service after creating parameters_client #219
Conversation
I don't think that wait_for_service is the issue here anymore. I'll provide a more detailed description tomorrow but will merge the connected PRs tonight before the nightlies start to get fewer test failures in the demo_nodes_cpp package in the meantime. |
OK, following up. This only happens with fastrtps (here's connext passing 100 times without any other modifications:
It seems to wait for the Nth service call indefinitely and never receive it. This is fixed by ros2/rmw_fastrtps#143 Other times the service call is complete and the executor is trying to return success, but then it hangs. This is fixed by ros2/rmw_fastrtps#142 While investigating, I've fixed a potential memory error in ros2/rcl_interfaces#18 and also prevented another crash that occurs at shutdown with eProsima/Fast-DDS#134, but neither of those fix the hanging by themselves. On my machine, which at this point has lots of zombie nodes lying around, to run the tests repeatedly eProsima/Fast-DDS#134 is needed to prevent crashing on teardown. On the buildfarm it didn't seem to be necessary though: These jobs include this PR and both of the rmw_fastrtps PRs. It does NOT include the fastrtps PR or the rcl_interfaces PR. The (For the record, here are jobs that also include the fastrtps PR) |
2e3da85
to
ce72ada
Compare
Redid the OSX job. This is rerunning This is repeating all of the In summary, with the other PRs allowing this PR to be tested thoroughly I'm confident in it and am putting this PR back in review. |
@@ -379,6 +404,7 @@ TEST(CLASSNAME(test_local_parameters, RMW_IMPLEMENTATION), set_parameter_if_not_ | |||
int main(int argc, char ** argv) | |||
{ | |||
// NOTE: use custom main to ensure that rclcpp::init is called only once | |||
setvbuf(stdout, NULL, _IONBF, BUFSIZ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this line one line higher. Otherwise it separates the comment line from the line it is commenting.
test_rclcpp/CMakeLists.txt
Outdated
@@ -140,15 +140,15 @@ if(BUILD_TESTING) | |||
TIMEOUT 70) | |||
custom_gtest(gtest_local_parameters | |||
"test/test_local_parameters.cpp" | |||
TIMEOUT 30) | |||
TIMEOUT 90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that we can reduce it to 90!
CI in ros2/rmw_fastrtps#147 (comment) showed that local parameter tests don't need wait_for_service, it must just be the remote ones. Indeed, https://github.com/ros2/system_tests/pull/159/files only ever added the workaround for the remote tests. I didn't realise that when the service constructor returns, it must be good to go (I figured there was middleware work going on asynchronously). I'll remove the wait_for_service calls from the local_paramters tests. |
…ow the service is there)" This reverts commit dce810a.
Nevermind, there may/may not still be middleware work going on asynchronously. Regardless, we decided that the Potentially, adding these adding so many standard CI repeating parameter tests 100 times: |
connects to ros2/rclcpp#356
I don't think it's necessary to do any waiting when calling node->set_parameters because FWIU there are no service calls involved (correct me if I'm wrong).
Also, now that we wait in so many places, the timeout has to be increased significantly (15 waits of 20s)
CI including other branches that add the implementation of wait_for_service and use it in demos:
this should fix flaky tests e.g. http://ci.ros2.org/view/nightly/job/nightly_linux_repeated/744/testReport/junit/(root)/projectroot/gtest_local_parameters__rmw_fastrtps_cpp but it doesn't fix them completely yet (the test can hang while waiting for the service instead of the assertion being raised after max 20s
)