Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SFlow] Sflow Crash on 200G ports handled #3

Closed
wants to merge 3 commits into from
Closed

Conversation

vivekrnv
Copy link
Owner

@vivekrnv vivekrnv commented Mar 23, 2021

What I did
Added 200G entry into the speed-rate map. Also handled the case which programs empty string into APP-DB and thus leading to a failure of Orchagent.

Default Sampling rate for 200G is set to 20000

Why I did it

How I verified it


root@sonic:/home/admin# show sflow interface 

sFlow interface configurations
+-------------+---------------+-----------------+
| Interface   | Admin State   |   Sampling Rate |
+=============+===============+=================+
| Ethernet0   | up            |           10000 |
+-------------+---------------+-----------------+
| Ethernet8   | up            |           40000 |
+-------------+---------------+-----------------+
| Ethernet16  | up            |           40000 |
+-------------+---------------+-----------------+
| Ethernet24  | up            |           20000 |
+-------------+---------------+-----------------+
| Ethernet28  | up            |           20000 |
+-------------+---------------+-----------------+
| Ethernet32  | up            |           20000 |
+-------------+---------------+-----------------+
| Ethernet36  | up            |           20000 |
+-------------+---------------+-----------------+
| Ethernet40  | up            |           10000 |
+-------------+---------------+-----------------+
| Ethernet42  | up            |           10000 |
+-------------+---------------+-----------------+
<Truncated>

admin@sonic:~$ docker ps -a
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS               NAMES
752b20dd8c35        docker-sflow:latest                  "/usr/local/bin/supe…"   5 hours ago         Up 18 minutes                           sflow
e696295db82b        docker-snmp:latest                   "/usr/local/bin/supe…"   5 hours ago         Up 22 minutes                           snmp
d0fc7b5196d5        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   5 hours ago         Up 22 minutes                           mgmt-framework
fa56887d234f        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   5 hours ago         Up 22 minutes                           telemetry
b30e1b421d8b        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   5 hours ago         Up 22 minutes                           radv
cbb1b430d2ad        docker-platform-monitor:latest       "/usr/bin/docker_ini…"   5 hours ago         Up 22 minutes                           pmon
b954e4ef41bb        docker-lldp:latest                   "/usr/bin/docker-lld…"   5 hours ago         Up 22 minutes                           lldp
610a476cdfa7        docker-dhcp-relay:latest             "/usr/bin/docker_ini…"   5 hours ago         Up 22 minutes                           dhcp_relay
7f70fb3c1705        docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   5 hours ago         Up 22 minutes                           syncd
8ac03c99f34b        docker-teamd:latest                  "/usr/local/bin/supe…"   5 hours ago         Up 23 minutes                           teamd
9cf4d11bdedf        docker-orchagent:latest              "/usr/bin/docker-ini…"   5 hours ago         Up 23 minutes                           swss
d186df1e6553        docker-fpm-frr:latest                "/usr/bin/docker_ini…"   5 hours ago         Up 23 minutes                           bgp
c847477d8fa7        docker-database:latest               "/usr/local/bin/dock…"   5 hours ago         Up 5 hours                              database


admin@r-leopard-41:~$ show sflow

sFlow Global Information:
  sFlow Admin State:          up
  sFlow Polling Interval:     default
  sFlow AgentID:              default

  0 Collectors configured:

Details if related

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@dgsudharsan dgsudharsan self-requested a review March 23, 2021 14:44
@liat-grozovik
Copy link

@vivekreddynv does the sflow test by community is now passing? if so, let create a public PR and set it to be backported to 202012 please.

BTW, recommend to remove the reference from your private PR to the public issue as the references are kept when the private one is also closed.

@vivekrnv
Copy link
Owner Author

Hi @liat-grozovik, I've ran this test on a setup which contains 200G+ ports and although some of the tests were failing (I'm working on them right now), the orchangent crash in not happening and which is what this PR is addressing.

This change is either way required. Please let me know if i should raise this PR.

@vivekrnv vivekrnv closed this Mar 26, 2021
@vivekrnv vivekrnv deleted the sflow_crash branch March 18, 2022 04:19
vivekrnv pushed a commit that referenced this pull request Aug 2, 2022
Currently, ASAN sometimes reports the BufferOrch::m_buffer_type_maps and QosOrch::m_qos_maps as leaked. However, their lifetime is the lifetime of a process so they are not really 'leaked'.
This also adds a simple way to add more suppressions later if required.

Example of ASAN report:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da9f789 in __static_initialization_and_destruction_0 /__w/2/s/orchagent/bufferorch.cpp:39
    #2 0x55ca1daa02af in _GLOBAL__sub_I_bufferorch.cpp /__w/2/s/orchagent/bufferorch.cpp:1321
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da6d2da in __static_initialization_and_destruction_0 /__w/2/s/orchagent/qosorch.cpp:80
    #2 0x55ca1da6ecf2 in _GLOBAL__sub_I_qosorch.cpp /__w/2/s/orchagent/qosorch.cpp:2000
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

- What I did
Added an lsan suppression config with static variable leak suppression

- Why I did it
To suppress ASAN false positives

- How I verified it
Run a test that produces the static variable leaks report and checked that report has these leaks suppressed.

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
vivekrnv pushed a commit that referenced this pull request Sep 14, 2022
Currently, ASAN sometimes reports the BufferOrch::m_buffer_type_maps and QosOrch::m_qos_maps as leaked. However, their lifetime is the lifetime of a process so they are not really 'leaked'.
This also adds a simple way to add more suppressions later if required.

Example of ASAN report:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da9f789 in __static_initialization_and_destruction_0 /__w/2/s/orchagent/bufferorch.cpp:39
    #2 0x55ca1daa02af in _GLOBAL__sub_I_bufferorch.cpp /__w/2/s/orchagent/bufferorch.cpp:1321
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da6d2da in __static_initialization_and_destruction_0 /__w/2/s/orchagent/qosorch.cpp:80
    #2 0x55ca1da6ecf2 in _GLOBAL__sub_I_qosorch.cpp /__w/2/s/orchagent/qosorch.cpp:2000
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

- What I did
Added an lsan suppression config with static variable leak suppression

- Why I did it
To suppress ASAN false positives

- How I verified it
Run a test that produces the static variable leaks report and checked that report has these leaks suppressed.

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
vivekrnv added a commit that referenced this pull request Oct 10, 2023
**What I did**

Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers

**Why I did it**

```
Indirect leak of 83776 byte(s) in 476 object(s) allocated from:
    #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115
    #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584
    #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634
    #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461
    #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501
    #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450
    #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763
    #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179
    #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241
    #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238
    #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235
    #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755
    #14 0x555558e9be25 in main orchagent/main.cpp:766
    #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
```
vivekrnv added a commit that referenced this pull request Dec 7, 2023
**What I did**

Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers

**Why I did it**

```
Indirect leak of 83776 byte(s) in 476 object(s) allocated from:
    #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115
    #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584
    #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634
    #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461
    #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501
    #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450
    #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763
    #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179
    #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241
    #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238
    #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235
    #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755
    #14 0x555558e9be25 in main orchagent/main.cpp:766
    #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants