Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes don't reconnect to TCP discovery server [13706] #2299

Closed
amfern opened this issue Oct 31, 2021 · 5 comments · Fixed by #2470
Closed

Nodes don't reconnect to TCP discovery server [13706] #2299

amfern opened this issue Oct 31, 2021 · 5 comments · Fixed by #2470

Comments

@amfern
Copy link

amfern commented Oct 31, 2021

Stopping and starting the same node don't reconnect to other nodes when using TCP discovery

Expected Behavior

Nodes connect to other publishers

Current Behavior

Starting and stopping nodes will not reconnect when uding TCP discovery server

Steps to Reproduce

  1. start TCP discovery server RMW_IMPLEMENTATION=rmw_fastrtps_dynamic_cpp FASTRTPS_DEFAULT_PROFILES_FILE=./discovery_server_tcp.xml ros2 run demo_nodes_cpp listener --ros-args -r __ns:=/aaaaaaaaa -r __node:=discovery_server
  2. start listener RMW_IMPLEMENTATION=rmw_fastrtps_dynamic_cpp FASTRTPS_DEFAULT_PROFILES_FILE=./discovery_client_tcp.xml ros2 run demo_nodes_cpp listener
  3. start talker RMW_IMPLEMENTATION=rmw_fastrtps_dynamic_cpp FASTRTPS_DEFAULT_PROFILES_FILE=./discovery_client_tcp_2.xml ros2 run demo_nodes_cpp talker
  4. observer communication between listener and talker
  5. stop talker with ctrl+c
  6. start talker with the same command as above in step 3
  7. observer no communication between listener and talker

System information

Testing inside container, all nodes are running inside the same container

  • Fast-RTPS version: ii ros-galactic-fastrtps 2.3.4-1focal.20210805.154711 amd64 Implementation of RTPS standard.
  •                             also testing against `bugfix/ds-reconnection` commit c779aee662a1c8ff4276abdd941f521efde0b380
    
  • OS: Linux ilya.linux 5.13.13-zen1-1-zen #1 ZEN SMP PREEMPT Thu, 26 Aug 2021 19:14:35 +0000 x86_64 GNU/Linux
  • Network interfaces:
root@ilya:/home/ilya/ros2# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
       inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
       ether 02:42:1f:d8:56:ec  txqueuelen 0  (Ethernet)
       RX packets 0  bytes 0 (0.0 B)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 0  bytes 0 (0.0 B)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp58s0u1u1i5: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
       ether 0c:37:96:0a:04:9e  txqueuelen 1000  (Ethernet)
       RX packets 0  bytes 0 (0.0 B)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 0  bytes 0 (0.0 B)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp60s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
       ether 8c:04:ba:9b:6d:cc  txqueuelen 1000  (Ethernet)
       RX packets 0  bytes 0 (0.0 B)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 0  bytes 0 (0.0 B)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
       inet 127.0.0.1  netmask 255.0.0.0
       inet6 ::1  prefixlen 128  scopeid 0x10<host>
       loop  txqueuelen 1000  (Local Loopback)
       RX packets 83913  bytes 17280514 (17.2 MB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 83913  bytes 17280514 (17.2 MB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
       inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
       ether 52:54:00:6b:01:42  txqueuelen 1000  (Ethernet)
       RX packets 0  bytes 0 (0.0 B)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 0  bytes 0 (0.0 B)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlo1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
       inet 192.168.18.107  netmask 255.255.255.0  broadcast 192.168.18.255
       inet6 fe80::78c9:537c:e6e8:5440  prefixlen 64  scopeid 0x20<link>
       ether c0:b8:83:57:58:64  txqueuelen 1000  (Ethernet)
       RX packets 252686  bytes 306869491 (306.8 MB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 135186  bytes 27771508 (27.7 MB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

  • ROS2: ros2 galactic docker

Additional context

video recording of the problem: https://youtu.be/NvcrguDneug

Additional resources

  • Wireshark capture
  • XML profiles file

discovery_server_tcp.xml

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>TCPv4_SERVER</transport_id>
            <type>TCPv4</type>
            <listening_ports>
                <port>27811</port>
            </listening_ports>
            <calculate_crc>false</calculate_crc>
            <check_crc>false</check_crc>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="TCP_server" is_default_profile="true">
        <rtps>
            <userTransports>
                <transport_id>TCPv4_SERVER</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
            <prefix>4d.49.47.55.45.4c.5f.42.41.52.52.4f</prefix>
            <builtin>
                <discovery_config>
                    <discoveryProtocol>SERVER</discoveryProtocol>
                    <leaseAnnouncement><sec>1</sec><nanosec>0</nanosec></leaseAnnouncement>
                    <leaseDuration><sec>3</sec><nanosec>0</nanosec></leaseDuration>
	                  <clientAnnouncementPeriod>
		                    <nanosec>250000000</nanosec>
	                  </clientAnnouncementPeriod>
                </discovery_config>
                <metatrafficUnicastLocatorList>
                    <locator>
                        <tcpv4>
                            <address>127.0.0.1</address>
                            <physical_port>27811</physical_port>
                            <port>6339</port>
                        </tcpv4>
                    </locator>
                </metatrafficUnicastLocatorList>
            </builtin>
        </rtps>
    </participant>
</profiles>

discovery_client_tcp.xml

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>LAN subscriber tcp transport</transport_id>
            <type>TCPv4</type>
            <listening_ports>
                <port>64863</port> <!-- publisher devoted tcp port -->
            </listening_ports>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="TCP_client_1" is_default_profile="true">
        <rtps>
            <prefix>63.6c.69.65.6e.74.31.5f.73.31.5f.5f</prefix>
            <userTransports>
                <transport_id>LAN subscriber tcp transport</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
            <builtin>
                <discovery_config>
                    <discoveryProtocol>CLIENT</discoveryProtocol>
                    <leaseAnnouncement><sec>1</sec><nanosec>0</nanosec></leaseAnnouncement>
                    <leaseDuration><sec>3</sec><nanosec>0</nanosec></leaseDuration>
	                  <clientAnnouncementPeriod>
		                    <nanosec>250000000</nanosec>
	                  </clientAnnouncementPeriod>
                    <discoveryServersList>
                        <RemoteServer prefix="4d.49.47.55.45.4c.5f.42.41.52.52.4f">
                            <metatrafficUnicastLocatorList>
                                <locator>
                                    <tcpv4>
                                        <address>127.0.0.1</address>
                                        <physical_port>27811</physical_port>
                                        <port>6339</port>
                                    </tcpv4>
                                </locator>
                            </metatrafficUnicastLocatorList>
                        </RemoteServer>
                    </discoveryServersList>
                </discovery_config>
            </builtin>
        </rtps>
    </participant>
</profiles>

discovery_client_tcp_2.xml

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>LAN publisher tcp transport</transport_id>
            <type>TCPv4</type>
            <listening_ports>
                <port>64753</port> <!-- publisher devoted tcp port -->
            </listening_ports>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="TCP_client_1" is_default_profile="true">
        <rtps>
            <prefix>63.6c.69.65.6e.74.32.5f.73.31.5f.5f</prefix>
            <userTransports>
                <transport_id>LAN publisher tcp transport</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
            <builtin>
                <discovery_config>
                    <discoveryProtocol>CLIENT</discoveryProtocol>
                    <leaseAnnouncement><sec>1</sec><nanosec>0</nanosec></leaseAnnouncement>
                    <leaseDuration><sec>3</sec><nanosec>0</nanosec></leaseDuration>
	                  <clientAnnouncementPeriod>
		                    <nanosec>250000000</nanosec>
	                  </clientAnnouncementPeriod>
                    <discoveryServersList>
                        <RemoteServer prefix="4d.49.47.55.45.4c.5f.42.41.52.52.4f">
                            <metatrafficUnicastLocatorList>
                                <locator>
                                    <tcpv4>
                                        <address>127.0.0.1</address>
                                        <physical_port>27811</physical_port>
                                        <port>6339</port>
                                    </tcpv4>
                                </locator>
                            </metatrafficUnicastLocatorList>
                        </RemoteServer>
                    </discoveryServersList>
                </discovery_config>
            </builtin>
        </rtps>
    </participant>
</profiles>
@JLBuenoLopez
Copy link
Contributor

Hi @amfern,

Would you mind retrying with the latest Fast DDS release? #2246 may have solved the issue

@JLBuenoLopez JLBuenoLopez changed the title Nodes don't reconnect to TCP discovery server Nodes don't reconnect to TCP discovery server [13706] Feb 1, 2022
@amfern
Copy link
Author

amfern commented Feb 1, 2022

Hi @JLBuenoLopez-eProsima

I can still reproduce it on rolling release, here are the versions

ii  ros-rolling-fastrtps                               2.3.4-1focal.20220120.180137         amd64        *eprosima Fast DDS* (formerly Fast RTPS) is a C++ implementation of the DDS (Data Distribution Service) standard of the OMG (Object Management Group).
ii  ros-rolling-fastrtps-cmake-module                  2.0.4-1focal.20220120.194458         amd64        Provide CMake module to find eProsima FastRTPS.
ii  ros-rolling-rmw-fastrtps-cpp                       6.1.2-1focal.20220121.220647         amd64        Implement the ROS middleware interface using eProsima FastRTPS static code generation in C++.
ii  ros-rolling-rmw-fastrtps-shared-cpp                6.1.2-1focal.20220121.214111         amd64        Code shared on static and dynamic type support of rmw_fastrtps_cpp.
ii  ros-rolling-rosidl-typesupport-fastrtps-c          2.0.4-1focal.20220121.211520         amd64        Generate the C interfaces for eProsima FastRTPS.
ii  ros-rolling-rosidl-typesupport-fastrtps-cpp        2.0.4-1focal.20220121.210627         amd64        Generate the C++ interfaces for eProsima FastRTPS.

@EduPonz
Copy link

EduPonz commented Feb 2, 2022

Hi @amfern ,

First of all, ROS 2 Rolling is not using the latest version of Fast DDS. The latest release is v2.5.0 from December 2021. ROS 2 Rolling is using release v2.3.4 from August 2021. This release does not include the latest features and bug fixes. We intend to update the ROS 2 Rolling version in the coming weeks, but the process is not straight forward.

That being said, I have been able to reproduce your issue with Fast DDS v2.5.0, but I've also tested that the issue is gone when using this branch. I'm attaching a Dockerfile you can use to build the latest Fast DDS on top of ROS 2 Galactic. As for the branch, we are currently working on adding some regression test and we'll issue a PR in the upcoming week, so the fix will included in Fast DDS v2.5.1.

Be aware that killing the talker node will most likely leave the socket in a TIME_WAIT state, so I only restarted the talker once the OS had freed that resource (normally it takes about a minute). You can check the sockets running:

netstat -tac

When everything is connected, it should look something like:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:27811           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:64753           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:64863           0.0.0.0:*               LISTEN     
tcp        0      0 localhost:40660         localhost:27811         ESTABLISHED
tcp        0      0 localhost:27811         localhost:40660         ESTABLISHED
tcp        0      0 localhost:64753         localhost:52634         ESTABLISHED
tcp        0      0 localhost:52634         localhost:64753         ESTABLISHED
tcp        0      0 localhost:27811         localhost:40650         ESTABLISHED
tcp        0      0 localhost:40650         localhost:27811         ESTABLISHED

When you kill the talker:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:27811           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:64863           0.0.0.0:*               LISTEN     
tcp        0      0 localhost:64753         localhost:52626         TIME_WAIT  
tcp        0      0 localhost:27811         localhost:40650         ESTABLISHED
tcp        0      0 localhost:40650         localhost:27811         ESTABLISHED

As for the listener shell, it looked:

root@1d4f36fc76c5:/overlay_ws# FASTRTPS_DEFAULT_PROFILES_FILE=./discovery_client_tcp.xml ros2 run demo_nodes_cpp listener
[INFO] [1643784156.384590288] [listener]: I heard: [Hello World: 1]
[INFO] [1643784157.383331681] [listener]: I heard: [Hello World: 2]
[INFO] [1643784158.384098321] [listener]: I heard: [Hello World: 3]
[INFO] [1643784159.384088010] [listener]: I heard: [Hello World: 4]
[INFO] [1643784160.384136011] [listener]: I heard: [Hello World: 5]
[INFO] [1643784161.384125410] [listener]: I heard: [Hello World: 6]
[INFO] [1643784162.384071661] [listener]: I heard: [Hello World: 7]
[INFO] [1643784163.384128530] [listener]: I heard: [Hello World: 8]
[INFO] [1643784164.384088499] [listener]: I heard: [Hello World: 9]
[INFO] [1643784165.383959092] [listener]: I heard: [Hello World: 10]
[INFO] [1643784166.383895661] [listener]: I heard: [Hello World: 11]
[INFO] [1643784167.383925390] [listener]: I heard: [Hello World: 12]

<------------------ I killed the talker here ------------------>

[INFO] [1643784237.203362908] [listener]: I heard: [Hello World: 1]
[INFO] [1643784238.202179323] [listener]: I heard: [Hello World: 2]
[INFO] [1643784239.202667758] [listener]: I heard: [Hello World: 3]
[INFO] [1643784240.202679300] [listener]: I heard: [Hello World: 4]
[INFO] [1643784241.202597198] [listener]: I heard: [Hello World: 5]
[INFO] [1643784242.202497975] [listener]: I heard: [Hello World: 6]
[INFO] [1643784243.202318467] [listener]: I heard: [Hello World: 7]
[INFO] [1643784244.202657345] [listener]: I heard: [Hello World: 8]
[INFO] [1643784245.202603421] [listener]: I heard: [Hello World: 9]
[INFO] [1643784246.202005817] [listener]: I heard: [Hello World: 10]
[INFO] [1643784247.202612419] [listener]: I heard: [Hello World: 11]
[INFO] [1643784248.202436682] [listener]: I heard: [Hello World: 12]
^C[INFO] [1643784248.813415190] [rclcpp]: signal_handler(signal_value=2)

On a final comment, note that I'm using rmw_fastrtps_cpp instead of rmw_fastrtps_dynamic_cpp. Is there a reason to use the dynamic rmw? Mind that rmw_fastrtps_dynamic_cpp is a Tier 3 rmw implementation in Galactic, while rmw_fastrtps_cpp is a Tier 1 (see here).

Build the Docker image

Put all the following files under a common directory, and from within said directory run:

docker build -t galactic-fastdds:tcp-fix -f Dockerfile .

Dockerfile

FROM ros:galactic-ros-base

# Needed for a dependency that forces to set timezone
ENV TZ=Europe/Madrid
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

SHELL ["/bin/bash", "-c"]

# Install ROS 2 packages
RUN apt update && apt install -y \
    ros-galactic-demo-nodes-cpp \
    ros-galactic-osrf-testing-tools-cpp \
    ros-galactic-performance-test-fixture \
    ros-galactic-test-msgs

RUN apt-get update && apt-get install --yes --no-install-recommends \
    build-essential \
    cmake \
    git \
    libasio-dev \
    libssl-dev

# ROS 2 overlay: Fast DDS (and dependecies), and rmw_fastrtps_cpp
WORKDIR /overlay_ws
RUN mkdir src

COPY overlay.repos .
COPY colcon.meta .

RUN vcs import src < overlay.repos && \
    source /opt/ros/galactic/setup.bash && \
    colcon build --packages-up-to rmw_fastrtps_cpp && \
    source install/setup.bash && \
    colcon build && \
    source install/setup.bash

# Set Fast DDS as ROS 2 middleware
ENV RMW_IMPLEMENTATION=rmw_fastrtps_cpp

RUN echo 'export RMW_IMPLEMENTATION=rmw_fastrtps_cpp' >> ~/.bashrc
RUN echo 'source /opt/ros/galactic/setup.bash' >> ~/.bashrc
RUN echo 'source /overlay_ws/install/local_setup.bash' >> ~/.bashrc

overlay.repos

repositories:
    foonathan_memory_vendor:
        type: git
        url: https://github.com/eProsima/foonathan_memory_vendor.git
        version: v1.2.0
    fastcdr:
        type: git
        url: https://github.com/eProsima/Fast-CDR.git
        version: v1.0.23
    fastrtps:
        type: git
        url: https://github.com/eProsima/Fast-DDS.git
        version: bugfix/tcp_client_block
    rmw_fastrtps:
        type: git
        url: https://github.com/ros2/rmw_fastrtps.git
        version: galactic
    rosidl_typesupport_fastrtps:
        type: git
        url: https://github.com/ros2/rosidl_typesupport_fastrtps.git
        version: galactic

colcon.meta

{
    "names":
    {
        "fastrtps":
        {
            "cmake-args":
            [
                "-DSECURITY=ON",
            ]
        },
        "rmw_fastrtps_cpp":
        {
            "cmake-args":
            [
                "-DSECURITY=ON"
            ]
        }
    }
}

@amfern
Copy link
Author

amfern commented Feb 2, 2022

Yes!! It is fixed.
The Dockerfile is a blessing, it made testing this fix a breeze 👍

It would really help if you can add this docker to source code for future reference.
I was following the guide at https://fast-dds.docs.eprosima.com/en/latest/installation/sources/sources_linux.html?highlight=compile, but I failed to compile fastrtps_rmw

@EduPonz
Copy link

EduPonz commented Feb 2, 2022

Hi @amfern ,

It is great to here that! I'll leave this ticket open until the proper PR is merged into master. Regarding the Dockerfile, we are planning something along those lines. Stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants