Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IntfMgrd] Retry adding ipv6 prefix to iface if it fails because of disabled_ipv6 flag set to 1 #9

Closed
wants to merge 8 commits into from

Conversation

vivekrnv
Copy link
Owner

@vivekrnv vivekrnv commented May 4, 2022

Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com

What I did

intfmgrd sets the flag by itself and retries setting the ip address if the ipv6 assignment fails for the first time

Why I did it

There might be a race condition b/w intfmgrd and Mellanox SDK where the SDK created Linux Netdev iface but still doesn't yet set disable_ipv6 flag to 0. If intfmgrd tries to assign ip to the iface, the attempt fails.

How I verified it

Ut's

vkarri@7f13d1e426de:/sonic/src/sonic-swss/tests/mock_tests$ ./tests_intfmgrd 
Running main() from /usr/src/gtest/src/gtest_main.cc
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from IntfMgrTest
[ RUN      ] IntfMgrTest.testSettingIpv6Flag
[       OK ] IntfMgrTest.testSettingIpv6Flag (1 ms)
[ RUN      ] IntfMgrTest.testNoSettingIpv6Flag
[       OK ] IntfMgrTest.testNoSettingIpv6Flag (0 ms)
[----------] 2 tests from IntfMgrTest (1 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (1 ms total)
[  PASSED  ] 2 tests.

Verified on the switch:

root@r-leopard-simx-69:/home/admin# cat /proc/sys/net/ipv6/conf/Ethernet0/disable_ipv6
0
root@r-leopard-simx-69:/home/admin# sysctl -w net.ipv6.conf.Ethernet0.disable_ipv6=1
net.ipv6.conf.Ethernet0.disable_ipv6 = 1
root@r-leopard-simx-69:/home/admin# config interface ip add Ethernet0 2001:0db8:0:f101::1/64
root@r-leopard-simx-69:/home/admin# sysctl net.ipv6.conf.Ethernet0.disable_ipv6
net.ipv6.conf.Ethernet0.disable_ipv6 = 0
root@r-leopard-simx-69:/home/admin# ifconfig Ethernet0
Ethernet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
        inet6 fe80::1e34:daff:fe1d:4200  prefixlen 64  scopeid 0x20<link>
        inet6 2001:db8:0:f101::1  prefixlen 64  scopeid 0x0<global>
        ether 1c:34:da:1d:42:00  txqueuelen 1000  (Ethernet)
        RX packets 18  bytes 3366 (3.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 22  bytes 4604 (4.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

In Syslog:

May  5 08:31:56.889599 r-leopard-simx-69 INFO swss#/supervisord: intfmgrd RTNETLINK answers: Permission denied
May  5 08:31:56.901305 r-leopard-simx-69 INFO swss#intfmgrd: :- enableIpv6Flag: disable_ipv6 flag to 0 for iface: Ethernet0, cmd: sysctl -w net.ipv6.conf.\"Ethernet0\".disable_ipv6=0, ret = 0

Details if related

vivekrnv added 2 commits May 4, 2022 08:36
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv vivekrnv changed the title [IntfMgrd] Add Checks and retry mechanism before configuring ipv6 address to iface [IntfMgrd] Try setting disabled_ipv6 flag to 0 is setting adding ipv6 prefix to iface fails May 5, 2022
@vivekrnv vivekrnv changed the title [IntfMgrd] Try setting disabled_ipv6 flag to 0 is setting adding ipv6 prefix to iface fails [IntfMgrd] Retry adding ipv6 prefix to iface if it fails first time May 5, 2022
@vivekrnv vivekrnv changed the title [IntfMgrd] Retry adding ipv6 prefix to iface if it fails first time [IntfMgrd] Retry adding ipv6 prefix to iface if it fails because of disabled_ipv6 flag set to 1 May 5, 2022
vivekrnv added 3 commits May 5, 2022 08:13
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv vivekrnv requested a review from dgsudharsan May 5, 2022 08:50
@@ -115,6 +144,16 @@ void IntfMgr::setIntfIp(const string &alias, const string &opCmd,
int ret = swss::exec(cmd.str(), res);
if (ret)
{
/*
ipv6 setting on the iface might still haven't been enabled by SDK.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment refers to Nvidia SDK. It is better not to mention that in common code.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove

*/
if (retryV6 && enableIpv6Flag(alias))
{
SWSS_LOG_INFO("Setting disable_ipv6 flag to 0 for iface: %s", alias.c_str());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log does not reflect what this code block is doing

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the relevant method

}

void IntfMgr::setIntfIp(const string &alias, const string &opCmd,
const IpPrefix &ipPrefix)
const IpPrefix &ipPrefix, bool retryV6)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any retry logic, why this flag is called retryV6?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm recursively calling this function again (and also enabling ipv6 on the iface) if this boolean is set. Will rename the argument.

ipv6 setting on the iface might still haven't been enabled by SDK.
Retry adding Ipv6 Prefix after enabling the setting
*/
if (retryV6 && enableIpv6Flag(alias))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a flag because you can check whether opCmd is "add" & ipPrefix.isV6() and enable ipv6 on interface

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the meeting with MSFT, they've suggested in trying setting the IP prefix first and then set the disable_ipv6 flag to zero if it fails, since this is relatively a rare condition

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

Comment on lines 1059 to 1063
if (!ip_prefix.isV4() && !g_ipv6Flag)
{
SWSS_LOG_ERROR("Tried to set ipv6 address %s to the iface %s, disable_ipv6 flag is set", ip_prefix.to_string().c_str(), alias.c_str());
return true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about global IPv6 setting? We have a per interface check inside setIntfIp

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

@@ -74,10 +101,12 @@ IntfMgr::IntfMgr(DBConnector *cfgDb, DBConnector *appDb, DBConnector *stateDb, c
{
mySwitchType = swtype;
}

g_ipv6Flag = inferIPV6Capability();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about global configuration?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove

Comment on lines 45 to 46
cmd << "sysctl -w " << "net.ipv6.conf." << shellquote(alias) << ".disable_ipv6=0";
return (swss::exec(cmd.str(), temp_res) == 0) ? true : false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be more optimal to write to a file using ifstream instead of spawning a bash shell

Copy link
Owner Author

@vivekrnv vivekrnv May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link Using bash shell is seemingly the norm in the code.

Moreover the chances of this race happening are low already, so i think it's okay to use the standard approach

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Comment on lines 117 to 125
int ret = swss::exec(cmd.str(), res);
if (ret)
{
if (allow_retry && enableIpv6Flag(alias))
{
return setIntfIp(alias, opCmd, ipPrefix, false);
}
SWSS_LOG_ERROR("Command '%s' failed with rc %d", cmd.str().c_str(), ret);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of an additional flag complicates the setIntfIp API. The code:

return setIntfIp(alias, opCmd, ipPrefix, false);

is not readable if you aren't aware what "false" for this API means.

I suggest to think of a way without the flag. for example:

Suggested change
int ret = swss::exec(cmd.str(), res);
if (ret)
{
if (allow_retry && enableIpv6Flag(alias))
{
return setIntfIp(alias, opCmd, ipPrefix, false);
}
SWSS_LOG_ERROR("Command '%s' failed with rc %d", cmd.str().c_str(), ret);
}
int failed = swss::exec(cmd.str(), res);
if (failed && ipPrefix.isV6() && opCmd == "add")
{
SWSS_LOG_NOTICE("Failed to assign IPv6 on interface %s with return code %d, trying to enable IPv6 and retry", alias.c_str(), failed);
if (!enableIpv6Flag(alias))
{
SWSS_LOG_ERROR("Failed to enable IPv6 on interface %s", alias.c_str());
return;
}
failed = swss::exec(cmd.str(), res);
}
if (failed)
{
SWSS_LOG_ERROR("Command '%s' failed with rc %d", cmd.str().c_str(), failed);
}

Also, note, that I've put an additional check if the IP is IPv6 since there is no point to retry IPv4 addresses.

What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a cleaner approach, Updated

Comment on lines 1 to 3
#include <iostream>
#include <string.h>
#include <fstream>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like added headers aren't used

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines +17 to +21
int exec(const std::string &cmd, std::string &stdout)
{
mockCallArgs.push_back(cmd);
return callback(cmd, stdout);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, this mock works due to how dynamic linker resolves symbols. Since swss::exec is defined inside tests binary it is getting used instead of swss::exec from libswsscommon.so. That means every test, not just add_ipv6_prefix_ut tests will use this version of swss::exec, however, they won't have an API to provide mock implementation. Is my undestanding correct?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests_intfmgrd will be compiled into a different binary so the other orchgent tests are not affected.

vivekrnv added 2 commits May 10, 2022 17:43
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv vivekrnv closed this May 12, 2022
vivekrnv added a commit that referenced this pull request Oct 10, 2023
**What I did**

Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers

**Why I did it**

```
Indirect leak of 83776 byte(s) in 476 object(s) allocated from:
    #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115
    #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584
    #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634
    #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461
    #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501
    #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450
    #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763
    #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179
    #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241
    #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238
    #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235
    #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755
    #14 0x555558e9be25 in main orchagent/main.cpp:766
    #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
```
vivekrnv added a commit that referenced this pull request Dec 7, 2023
**What I did**

Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers

**Why I did it**

```
Indirect leak of 83776 byte(s) in 476 object(s) allocated from:
    #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115
    #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584
    #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634
    #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461
    #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501
    #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450
    #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763
    #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179
    #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241
    #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238
    #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235
    #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755
    #14 0x555558e9be25 in main orchagent/main.cpp:766
    #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants