-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parameter client bug: wait_for_service sometimes failed #660
Comments
No one check this issue? I have a similar issue #661, please check if you can reproduce my issue using my method. These two issues have stopped me from moving on to ros2. |
There is some possibility that this issue will be solved by the Fast-RTPS backport for 1.7.2 we're going to do for the next Crystal patch release. It's worth a try, though it is going to be a bit complicated at the moment. If you want to try it before we have the next official patch release, you can do something like:
A report back whether this helps would be useful. Another test you can run is to try out Opensplice, to see whether this is a problem in the RTPS/DDS layer or ROS 2. To do that, you should:
Again, a report on whether this helps would be useful. |
ok, i will try these two tests you mentioned. |
Here is my test result: test1(update fastrpts) : not works. It still hangs in rare case. Worse is that wait_for_service never returns for timeout in this test when it failed. And @clalancette , you mentioned twice backport ros2/rmw_fastrtps#264, anything wrong? test2: it works. After switching to openslice, the respone of wait_for_service is quick and always works. |
I am trying navigation2 right now. There is fatal error when i launch nav2_bringup**_launch.py with openslice. the output is :
when with default fastrtps, things are ok. what's wrong with openslice in nav2? |
Sorry, that was a typo. The second one should have been ros2/rmw_fastrtps#256; will update the original comment with that. |
So the question I still have; is this better, worse, or the same as with Fast-RTPS v1.7.0? That will give us some more information about whether v1.7.2 is going the right direction. |
I would like to say v1.7.2 is at most the same as v1.7.0 in my test, if it is not worse. In v1.7.0, wait_for_service will return for timeout when failed, bu in 1.7.2, it will block in wait_for_service forever. |
All right, thanks for the feedback. @richiware , it seems like this is probably a bug in Fast-RTPS (or rmw_fastrtps), but I haven't had time to reproduce myself. A look would be appreciated. |
update: I found that it was actually block in 'get_parameters' instead of wait_for_service. And, i tried my test on another PC, I can reproduce timeout problem in v1.7.0 while can't reproduce blocking problem in v1.7.2. It is wired to me. |
All right, thanks for the update. At this point, we are going to go ahead with updating Crystal to 1.7.2. Once that is out, we can retest with the official binaries and ensure that everything is working. I'll leave this open until then. |
There is no benefit in waiting with testing until that happens. I will recommend to just build the latest version of FastRTPS and it's RMW packages from source and test it now. If the new version doesn't address the problem any solution would otherwise be delayed another month. |
My point is that that was already done above, and we have two different results from two different machines. I haven't seen this particular problem myself, so my thought is that it is an environmental issue which will be resolved with the official binaries. |
@huchhong Binaries of Fast-RTPS v1.7.2 and an updated rmw_fastrtps with the fixes are available. We would appreciate a retest of your problem with the binary packages; to get them, you'll have to do a few things:
The last step is very important as this update is an ABI break; it's probably easiest to completely remove your |
I have retested my problem according to your procedure. I make sure these things:
On my previous failed PC, problem still exists that program would blocks in 'get_parameter' function just like in my custom compiled ros2 version. |
I test my case on a virtual machine which never installed ros2 before. After i installed ros2 from ros2-testing source and compiled my code against ros2, i can reproduce this blocking problem. |
Sorry for the delay, I will try to check this week and find a solution. |
Thanks for doing the test again. Looks like there is still something of a problem.
We appreciate you looking into it. Let us know when you have something that we can test. |
I was not able to check it. We are working on the next release and I didn't have time. We have this issue tracked on our tracker system. We will address it at some point. Sorry. |
I just want to mention that it would be really good to address the problem in the upcoming release. The earlier we can test a potential fix the more likely will it make it into the upcoming release. |
Hi everybody, We are not being able to reproduce this issue using a fresh installation of ROS2 Crystal patch 4. |
I think this may have been solved by eProsima/Fast-DDS#532. @huchhong Could you test with the current master to check if the issue is still there? |
@clalancette This was reported for crystal and, though we know there is room for improvement on services, I think it can be closed. |
All right, closing. If someone is still having an issue with this, please feel free to open another bug. |
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
…os2#660) * Remove temporary directory platform-specific logic from test fixture Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Bug report
Required Info:
Steps to reproduce issue
Yesterday, i wrote a simple test to get myself familiar with parameter client in ros2. My code is hosted at example. There is a luanch script which launchs a node to set parameter and another node to get parameter from first node using SyncParametersClient interaface.
I ran this launch script multiple times, and found that sometimes the second node will wait too long for service after the first node has launched. This seems not good to me.
In a more rare case, after the first node has launched, the second node never successfully wait_for_service. The log is given below:
To reproduce this error, you can execute the launch file multiple times, maybe 10 or 20 times, and this should happen.
Expected behavior
Quick and successful response for wait_for_service in parameter client
Actual behavior
waiting too long or more worse, never succuess
The text was updated successfully, but these errors were encountered: