-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IntfMgrd] Retry adding ipv6 prefix to iface if it fails because of disabled_ipv6 flag set to 1 #9
Conversation
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
cfgmgr/intfmgr.cpp
Outdated
@@ -115,6 +144,16 @@ void IntfMgr::setIntfIp(const string &alias, const string &opCmd, | |||
int ret = swss::exec(cmd.str(), res); | |||
if (ret) | |||
{ | |||
/* | |||
ipv6 setting on the iface might still haven't been enabled by SDK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment refers to Nvidia SDK. It is better not to mention that in common code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove
cfgmgr/intfmgr.cpp
Outdated
*/ | ||
if (retryV6 && enableIpv6Flag(alias)) | ||
{ | ||
SWSS_LOG_INFO("Setting disable_ipv6 flag to 0 for iface: %s", alias.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log does not reflect what this code block is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the relevant method
cfgmgr/intfmgr.cpp
Outdated
} | ||
|
||
void IntfMgr::setIntfIp(const string &alias, const string &opCmd, | ||
const IpPrefix &ipPrefix) | ||
const IpPrefix &ipPrefix, bool retryV6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any retry logic, why this flag is called retryV6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm recursively calling this function again (and also enabling ipv6 on the iface) if this boolean is set. Will rename the argument.
cfgmgr/intfmgr.cpp
Outdated
ipv6 setting on the iface might still haven't been enabled by SDK. | ||
Retry adding Ipv6 Prefix after enabling the setting | ||
*/ | ||
if (retryV6 && enableIpv6Flag(alias)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need a flag because you can check whether opCmd is "add" & ipPrefix.isV6() and enable ipv6 on interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the meeting with MSFT, they've suggested in trying setting the IP prefix first and then set the disable_ipv6 flag to zero if it fails, since this is relatively a rare condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see
cfgmgr/intfmgr.cpp
Outdated
if (!ip_prefix.isV4() && !g_ipv6Flag) | ||
{ | ||
SWSS_LOG_ERROR("Tried to set ipv6 address %s to the iface %s, disable_ipv6 flag is set", ip_prefix.to_string().c_str(), alias.c_str()); | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care about global IPv6 setting? We have a per interface check inside setIntfIp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove
cfgmgr/intfmgr.cpp
Outdated
@@ -74,10 +101,12 @@ IntfMgr::IntfMgr(DBConnector *cfgDb, DBConnector *appDb, DBConnector *stateDb, c | |||
{ | |||
mySwitchType = swtype; | |||
} | |||
|
|||
g_ipv6Flag = inferIPV6Capability(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care about global configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove
cfgmgr/intfmgr.cpp
Outdated
cmd << "sysctl -w " << "net.ipv6.conf." << shellquote(alias) << ".disable_ipv6=0"; | ||
return (swss::exec(cmd.str(), temp_res) == 0) ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be more optimal to write to a file using ifstream instead of spawning a bash shell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link Using bash shell is seemingly the norm in the code.
Moreover the chances of this race happening are low already, so i think it's okay to use the standard approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
cfgmgr/intfmgr.cpp
Outdated
int ret = swss::exec(cmd.str(), res); | ||
if (ret) | ||
{ | ||
if (allow_retry && enableIpv6Flag(alias)) | ||
{ | ||
return setIntfIp(alias, opCmd, ipPrefix, false); | ||
} | ||
SWSS_LOG_ERROR("Command '%s' failed with rc %d", cmd.str().c_str(), ret); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of an additional flag complicates the setIntfIp API. The code:
return setIntfIp(alias, opCmd, ipPrefix, false);
is not readable if you aren't aware what "false" for this API means.
I suggest to think of a way without the flag. for example:
int ret = swss::exec(cmd.str(), res); | |
if (ret) | |
{ | |
if (allow_retry && enableIpv6Flag(alias)) | |
{ | |
return setIntfIp(alias, opCmd, ipPrefix, false); | |
} | |
SWSS_LOG_ERROR("Command '%s' failed with rc %d", cmd.str().c_str(), ret); | |
} | |
int failed = swss::exec(cmd.str(), res); | |
if (failed && ipPrefix.isV6() && opCmd == "add") | |
{ | |
SWSS_LOG_NOTICE("Failed to assign IPv6 on interface %s with return code %d, trying to enable IPv6 and retry", alias.c_str(), failed); | |
if (!enableIpv6Flag(alias)) | |
{ | |
SWSS_LOG_ERROR("Failed to enable IPv6 on interface %s", alias.c_str()); | |
return; | |
} | |
failed = swss::exec(cmd.str(), res); | |
} | |
if (failed) | |
{ | |
SWSS_LOG_ERROR("Command '%s' failed with rc %d", cmd.str().c_str(), failed); | |
} |
Also, note, that I've put an additional check if the IP is IPv6 since there is no point to retry IPv4 addresses.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a cleaner approach, Updated
cfgmgr/intfmgr.cpp
Outdated
#include <iostream> | ||
#include <string.h> | ||
#include <fstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like added headers aren't used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
int exec(const std::string &cmd, std::string &stdout) | ||
{ | ||
mockCallArgs.push_back(cmd); | ||
return callback(cmd, stdout); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, this mock works due to how dynamic linker resolves symbols. Since swss::exec is defined inside tests binary it is getting used instead of swss::exec from libswsscommon.so. That means every test, not just add_ipv6_prefix_ut tests will use this version of swss::exec, however, they won't have an API to provide mock implementation. Is my undestanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests_intfmgrd will be compiled into a different binary so the other orchgent tests are not affected.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
**What I did** Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers **Why I did it** ``` Indirect leak of 83776 byte(s) in 476 object(s) allocated from: #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115 #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460 #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584 #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634 #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461 #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501 #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450 #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763 #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179 #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241 #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238 #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235 #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755 #14 0x555558e9be25 in main orchagent/main.cpp:766 #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) ```
**What I did** Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers **Why I did it** ``` Indirect leak of 83776 byte(s) in 476 object(s) allocated from: #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115 #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460 #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584 #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634 #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461 #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501 #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450 #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763 #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179 #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241 #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238 #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235 #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755 #14 0x555558e9be25 in main orchagent/main.cpp:766 #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) ```
Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com
What I did
intfmgrd sets the flag by itself and retries setting the ip address if the ipv6 assignment fails for the first time
Why I did it
There might be a race condition b/w intfmgrd and Mellanox SDK where the SDK created Linux Netdev iface but still doesn't yet set disable_ipv6 flag to 0. If intfmgrd tries to assign ip to the iface, the attempt fails.
How I verified it
Ut's
Verified on the switch:
In Syslog:
Details if related