-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent crash #157
Comments
Please provide instructions for replicating |
Well that's part of the problem. I cannot reliably reproduce the issue on-demand. I don't know what is triggering the failure. But here's what I did, verbatim. Windows: Open command prompt.
Robot connects to agent. Open another command prompt
Agent crashes. I'll continue testing and see if I can come up with something more definitive. |
I have found a reliable way to make it crash. It seems to be related to the hard-liveliness-check. If I turn off my robot and wait ten seconds, the agent stops with I don't know that this is the same issue as the first crash I experienced. But, it is a reproducible issue. EDIT: (This is on Windows 10) |
I'll try to reproduce it. |
Could you confirm if this issue is replicable under Linux? |
@pablogs9: apologies for the 'vague' description here. It's just that we don't have any additional information ourselves, so we can not share anything more than @ted-miller already did.
In the meantime I've been able to reproduce this on Linux as well, specifically using the A sequence of starting the Sometimes the Agent just hangs (no symptoms other than a non-responding Agent), other times it hangs while consuming 100% of a CPU core (and being non-responsive). Sometimes it crashes with a It never really seems to print anything related to this, but I haven't ran it with increased debug level (so there might be something, but I haven't seen it). I cannot reproduce it using the Both Docker images were pulled last Friday (the 1st). One complicating factor: we are not completely up-to-date on the Client side, and given the fact there seems to be some implication it has something to do with the liveness check (which appears to have had some PRs merged), it could be our Client is misbehaving (at least from the perspective of the Agent). Again, our apologies for the vague report. I'll update when we have more information. Edit: oh and when the Agent becomes unresponsive, the Client application receives non-zero return values from And to clarify: while the Agent is unresponsive on the PC side, apparently it is responsive enough to still let the Client progress through at least a few phases in the connection handshake. It doesn't complete it though, as evidenced by the failures in |
I'm going to try with some subscribers on the client-side. Update: Working as expected with subscribers. In which version of the Micro XRCE-DDS Client are you? |
eProsima/Micro-XRCE-DDS-Client@e3f6439 we had to track
The subscribers I mentioned are on the Agent side (ie: ROS PC).
just to make sure: it takes a few disappearances of the Client to trigger this. It's not 100% reproducible all the time. |
How many aprox? 10? 100? |
2 to 5 when I tried to reproduce it on Friday. And we're using UDPv4. Command used to start the Agent (in my case, @ted-miller's is different, as he's on Windows, but he's shown the command he's using in #157 (comment)):
|
I've just updated our XRCE-DDS Client to Sequence of steps/commands:
I'll see if I can run the Agent in |
Ok, we are going to replicate this scenario. |
Replicated, working on it. |
Could you check if you can replicate this using a bare Micro XRCE-DDS Agent instead of the micro-ROS Agent? |
I'm participating in World ROS-I day today, but I believe @ted-miller will be online in about 6 hours from now. Would you have an idea / hunch already? |
I can try this today |
BTW: Last week, I opened the uros Agent Visual Studio solution in the hopes that it would show me where the crash occurs. The call stack seems to indicate that the exception is coming from the fastrtps library. But, I didn't have the debug symbols, so I'm not getting any real useful info. I tried building a version with debug symbols. But, I couldn't get it to link everything properly. EDIT: The call stack also included the micro xrce-dds agent library too. So, it could be an issue in the xrce agent that passed invalid data to fastrtps. |
I was not able to reproduce the issue using the micro-xrce-dds agent.
I have connected (using the same client application on the robot as previous tests) 10 times without issue. |
@ted-miller: but there were no subscribers that time, were there? |
Correct, there were no subscribers. But, I didn't have any subscribers in previous reproductions either. |
Ah, ok. I've always had subscribers, and that seems to 'reliably' trigger it. |
Another data point: just had the Agent So same steps as in #157 (comment), but I didn't make it past step 3. Same Client version, same version of the Docker image ( |
We replicated this using your instructions, and the issue seems to be identified by the Fast DDS team. CC: @EduPonz @MiguelCompany @jsantiago-eProsima |
@pablogs9: are eProsima/Fast-DDS#2794, eProsima/Fast-DDS#2801 and eProsima/Fast-DDS#2828 related? |
friendly ping @pablogs9. |
Not sure @gavanderhoorn, maybe @EduPonz @MiguelCompany @jsantiago-eProsima can tell you |
It sure looks like it! We are bundling a Fast DDS v2.6.2 by the end of this week so it'll be included in the next Humble sync. |
@EduPonz thanks. Was this a problem specific to Humble (ie: Fast-DDS on Humble)? I've not been able to reproduce the problem(s) with a Galactic image. |
The Agent Docker images also get FastDDS from the OR repositories, correct? |
@gavanderhoorn micro-ROS Agent uses the installed Fast DDS version if it is available. So in the docker, it uses the OSRF distributed binary, yes. |
Have the PRs related to the problem discussed in this issue been merged upstream already? I'd like to test whether the crash has been solved. From your description it sounds like I could force using a from-source build of FastRTPS by avoiding installing the binary packages. Correct? |
Could you check if #169 solves this issue? |
Yes, #169 appears to have fixed the issue. Thank you. |
Thanks for the fix. Have/will the docker image(s) be(en) updated? |
@gavanderhoorn ongoing generation:https://github.com/micro-ROS/docker/actions/runs/2912224500 Sorry for the delay in the fix! |
Describe the bug
The agent crashes.
On Windows:
[ros2run]: Process exited with failure 3221225477
On Ubuntu (using docker image):
[ros2run]: Segmentation fault
To Reproduce
Not sure.
For the first crash (using Windows), my robot connected to the agent just fine. Then I opened another command prompt and tried
ros2 topic echo /joint_states
. The agent immediately failed with[ros2run]: Process exited with failure 3221225477
.Then I switched to my Ubuntu machine and started up the docker image.
It seemed to be working, so I assumed it was something wrong with the Windows version.
I left the agent connected with the robot running. (Robot might have been rebooted at some point; I don't really know.) Some time later, I went to shut down the Ubuntu machine and saw the agent had failed.
[ros2run] Segmentation fault
So, I figured I would try it on Windows again to get a procedure to reproduce the error. But, I can't make it happen again.
System information (please complete the following information):
Additional context
Up until now, I had been using a galactic version of the Agent. I had not had any problems with it. This is my first time using Humble.
The text was updated successfully, but these errors were encountered: