Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom ACL Based Metering #12

Open
wants to merge 44 commits into
base: master
Choose a base branch
from
Open

Custom ACL Based Metering #12

wants to merge 44 commits into from

Conversation

shaygol
Copy link

@shaygol shaygol commented Dec 30, 2024

  • YANG updates

@shaygol shaygol force-pushed the policy_based_metering branch 2 times, most recently from 436e863 to f277b14 Compare January 15, 2025 13:46
@shaygol shaygol marked this pull request as ready for review January 15, 2025 13:48
@shaygol shaygol self-assigned this Jan 15, 2025
VladimirKuk pushed a commit that referenced this pull request Jan 21, 2025
#### Why I did it

To fix errors that happen when writing to the queue:

```
Jun  5 23:04:41.798613 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.798985 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.799535 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.806010 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.814075 r-leopard-56 ERR healthd: system_service[Errno 104] Connection reset by peer
Jun  5 23:04:41.824135 r-leopard-56 ERR healthd: Traceback (most recent call last):#12  File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 484, in system_service#012    msg = self.myQ.get(timeout=QUEUE_TIMEOUT)#12  File "<string>", line 2, in get#012  File "/usr/lib/python3.9/multiprocessing/managers.py", line 809, in _callmethod#012    kind, result = conn.recv()#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 255, in recv#012    buf = self._recv_bytes()#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes#012    buf = self._recv(4)#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 384, in _recv#012    chunk = read(handle, remaining)#012ConnectionResetError: [Errno 104] Connection reset by peer
Jun  5 23:04:41.826489 r-leopard-56 INFO healthd[8494]: ERROR:dbus.connection:Exception in handler for D-Bus signal:
Jun  5 23:04:41.826591 r-leopard-56 INFO healthd[8494]: Traceback (most recent call last):
Jun  5 23:04:41.826640 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3/dist-packages/dbus/connection.py", line 232, in maybe_handle_message
Jun  5 23:04:41.826686 r-leopard-56 INFO healthd[8494]:     self._handler(*args, **kwargs)
Jun  5 23:04:41.826738 r-leopard-56 INFO healthd[8494]:   File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 82, in on_job_removed
Jun  5 23:04:41.826785 r-leopard-56 INFO healthd[8494]:     self.task_notify(msg)
Jun  5 23:04:41.826831 r-leopard-56 INFO healthd[8494]:   File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 110, in task_notify
Jun  5 23:04:41.826877 r-leopard-56 INFO healthd[8494]:     self.task_queue.put(msg)
Jun  5 23:04:41.826923 r-leopard-56 INFO healthd[8494]:   File "<string>", line 2, in put
Jun  5 23:04:41.826973 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/managers.py", line 808, in _callmethod
Jun  5 23:04:41.827018 r-leopard-56 INFO healthd[8494]:     conn.send((self._id, methodname, args, kwds))
Jun  5 23:04:41.827065 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 211, in send
Jun  5 23:04:41.827115 r-leopard-56 INFO healthd[8494]:     self._send_bytes(_ForkingPickler.dumps(obj))
Jun  5 23:04:41.827158 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
Jun  5 23:04:41.827199 r-leopard-56 INFO healthd[8494]:     self._send(header + buf)
Jun  5 23:04:41.827254 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 373, in _send
Jun  5 23:04:41.827322 r-leopard-56 INFO healthd[8494]:     n = write(self._handle, buf)
Jun  5 23:04:41.827368 r-leopard-56 INFO healthd[8494]: BrokenPipeError: [Errno 32] Broken pipe
Jun  5 23:04:42.800216 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
```

When the multiprocessing.Manager is shutdown the queue will raise the above errors. This happens during shutdown - fast-reboot, warm-reboot.


With the fix, system-health service does not hang:

```
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:07:56 PM IDT 2024: Stopping...
Thu Oct 17 01:07:58 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:08:13 PM IDT 2024: Stopping...
Thu Oct 17 01:08:14 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:09:05 PM IDT 2024: Stopping...
Thu Oct 17 01:09:06 PM IDT 2024: Stopped
```

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

Remove the call to shutdown, the cleanup will happen automatically when GC runs as per documentation - https://docs.python.org/3/library/multiprocessing.html

#### How to verify it

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

Run warm-reboot, fast-reboot multiple times and verify no errors in the log.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202311
- [x] 202405

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
VladimirKuk pushed a commit that referenced this pull request Jan 21, 2025
…et#21095)

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
sonic-net#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
sonic-net#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
cscarpitta and others added 18 commits January 21, 2025 09:10
The FRR CLI to support SRv6 Static SIDs has been merged in FRR mainline in this PR (FRRouting/frr#16894). The CLI has been ported into SONiC mainline in this PR (sonic-net#21380).
This PR verifies the SRv6 Static SIDs configured by the above FRR CLI. It verifies that the block and node parts of the configured SID matches block and node parts of the locator it belongs to. The PR computes the parameters that will be installed with the SID into APPL DB. The changes in this PR will be also added into FRR mainline.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
Change the YANG schema of the SRv6 module to use an ipv6-prefix type for the key of the SRV6_MY_SIDS table.
…#21425)

- Why I did it
Most of the thermal sensor has continues index, for example: module1_temp_input, module2_temp_input. However, there could be some thermal sensors whose index is discrete. For example, some platform only contains thermal sensor for sodimm2_temp_input, but there is no such sensor for sodimm1_temp_input.

This PR is to support thermal sensor which has discrete index.

- How I did it
Allow sensor with discrete index, create thermal object for it

- How to verify it
manual test
unit test
…nic-net#21312)

- Why I did it
Set default frequency governor to performance

- How I did it
Add cpufreq.default_governor=performance cmdline parameter
…ime change (sonic-net#21446)

- Why I did it
Mellanox platform API uses standard python time function time.time() in many places. time.time() gets time from system clock which could be changed by NTP or user. Adjusting system clock will affect the code logical and causes bugs. For example, in platform/mellanox/mlnx-platform-api/sonic_platform/utils.py there is a Timer class, the timer will trigger event with unexpected interval if user/NTP changes the system clock. This PR changes time.time() to time.monotonic to avoid such issue.

- How I did it
Use time.monotonic() instead of time.time .

- How to verify it
Manual test.
Unit test.
- Why I did it
Update Mellanox MFT version to 4.30.2-23

- How I did it
Update mft.mk Make File to consume the new version of MFT

- How to verify it
Run sonic-mgmt tests
…K and firmware updates. (sonic-net#21483)

* [ufispace][platforms] Remove the high threshold of the PSU, as the BMC 11.8 firmware no longer supports it.
Remove the high threshold of the PSU on the following platforms, as the BMC 11.8 firmware no longer supports it.
* s7801-54xs
* s8901-54xc
* s9110-32x

* [ufispace][s9110-32x] Update bcm port configuration file
[Mellanox] Update SAI version to SAIBuild2411.245.30.1
[Broadcom] Upgrade xgs SAI to 12.3.0.3
…6 prefix (sonic-net#21468)

To adapt bgpcfgd to the new schema of SRV6_MY_SIDS

Signed-off-by: BYGX-wcr <wcr@live.cn>
…tically (sonic-net#21472)

#### Why I did it
src/sonic-sairedis
```
* 9137103d - (HEAD -> master, origin/master, origin/HEAD) Update SAI to v1.15.3 (sonic-net#1495) (3 days ago) [Riff]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Add buffer configs for TH5 C224 and C256 SKUs

* BCM SAI temp changes

* Update cable length as 0m for 100G breakout SKUs

* Add BUFFER_QUEUE profile

* Add dscp to tc, tc to queue, and scheduler mappings

* Update the DSCP to TC mapping

* Fixes for yaml, queue index validation

---------

Co-authored-by: Rick Robbins <rick@arista.com>
* Remove support for cavium platform

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
…les that does not support this API (sonic-net#21196)

- Why I did it
Not all xcvr API support get_error_description, for example sff8636. For those API types, get_error_description should return "Not supported".

- How I did it
get_error_description should return "Not supported".

- How to verify it
Manual test
unit test
- Why I did it
Support running hw-management service on SN5640 emulation platform.

- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files

- How to verify it
Run Nvidia simulation on SN5640 (ASIC and Platform)
skip ipinip tunnel creation if many interfaces
@shaygol shaygol changed the title Policy Base Metering Custom ACL Based Metering Jan 26, 2025
cscarpitta and others added 25 commits January 27, 2025 08:37
[dplane_fpm_sonic]: Fix for SRv6 SIDs learnt from the kernel
… config (sonic-net#21475)

Why I did it
DHCP default route shoule be an optional config to DHCP client

Work item tracking
Microsoft ADO (number only): 30877295

How I did it
Make the configuration to be optional in yang model

How to verify it
UTs
…et#21462)

Why I did it
DHCP default route shoule be an optional config to DHCP client

Work item tracking
Microsoft ADO (number only): 30877295

How I did it
Support to do not send default route to dhcp client

How to verify it
UT
Install new image to test
[master] Upgrade SONiC package Versions
sonic-net#21520)

Why I did it
This change is done because the DPUs are initalized with the SonicDpu type from sonic-config-engine

sonic-buildimage/src/sonic-config-engine/config_samples.py

Line 148 in 9b9da85

 data['DEVICE_METADATA']['localhost']['type'] = 'SonicDpu' 

This is added to the yang models in order to yang validation doesn't fail
Fixes: sonic-net#21111
- Why I did it
To fix buffers_defaults_object.j2 issues:
1. missing comma
2. missing table name
3. use of a removed profile

- How I did it
Updated the file to add comma, table name and use an existing profile

- How to verify it
config load_minigraph on the switch with Mellanox-SN5600-C256S1 SKU
SONiC-FRR communication channel support srv6 vpn
Why I did it
To add support for Z9664F platform

How I did it
Implemented the support for the platform Z9664F

Switch Vendor: Dell
Switch SKU: Z9664F
ASIC Vendor: Broadcom
SONiC Image: sonic-broadcom.bin

How to verify it
Verified the platform show commands and also executed the sonic-mgmt testcases.
logs.txt

Added PDDF changes as well and attaching the logs
The syncd is not up and will be raising it to broadcom for the same as it requires SAI support.
logs.zip
…omatically (sonic-net#21573)

#### Why I did it
src/sonic-swss-common
```
* e64d2b9 - (HEAD -> master, origin/master, origin/HEAD) Add new software bfd state db table in schema (sonic-net#957) (2 days ago) [Abdel Baig]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tically (sonic-net#21571)

#### Why I did it
src/sonic-sairedis
```
* d3b2503f - (HEAD -> master, origin/master, origin/HEAD) Fix pipeline errors related to rsyslogd and libswsscommon installation (sonic-net#1514) (5 hours ago) [Saikrishna Arcot]
* 8c47d772 - [syncd] Support bulk set in INIT_VIEW mode (sonic-net#1496) (3 days ago) [Stepan Blyshchak]
```
#### How I did it
#### How to verify it
#### Description for the changelog
… automatically (sonic-net#21570)

#### Why I did it
src/sonic-platform-common
```
* bead25d - (HEAD -> master, origin/master, origin/HEAD) Add 800G innolight PNs (sonic-net#529) (34 hours ago) [Dylan Godwin]
* 2c0f9ed - [cmis] Optimize cmis.get_error_description speed for passive module (sonic-net#526) (34 hours ago) [Junchao-Mellanox]
* e729c72 - support DSFP (sonic-net#532) (35 hours ago) [Philo]
* fc91c36 - Override MaxDurationDPInit through software for values <= 1s (sonic-net#533) (5 days ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (sonic-net#21564)

#### Why I did it
src/sonic-gnmi
```
* a023991 - (HEAD -> master, origin/master, origin/HEAD) GNOI Implementation of OS.Verify (sonic-net#342) (33 hours ago) [Dawei Huang]
* a538f49 - Enable Pfcwd Queries (sonic-net#332) (2 days ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…omatically (sonic-net#21471)

#### Why I did it
src/sonic-mgmt-common
```
* dca2e83 - (HEAD -> master, origin/master, origin/HEAD) [oc-system.yang : upgrade] Upgrading openconfig-system.yang version from 0.7.0 to 2.1.0 (openconfig community latest revision) (sonic-net#147) (13 days ago) [Anukul Verma]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…up.py (sonic-net#21560)

Previously, I did not add an entry in setup.py to install the srv6 yang model
Now, adding the missing entry for sonic-srv6.yang in sonic-yang-models/setup.py
Why I did it
Previously critical_process was defined duplicated like below:
group:sonic-bmp
program:openbmpd
program:bmpcfgd

which break some mgmt test cases.

How I did it
Get rid of group and follow most of other dockers to define program directly.

How to verify it
verified on DUT, program could work correctly.
…tically (sonic-net#21585)

#### Why I did it
src/sonic-sairedis
```
* 77d82e82 - (HEAD -> master, origin/master, origin/HEAD) Revert "Revert back to SAI version 1 15 (sonic-net#1481)" (sonic-net#1507) (32 minutes ago) [prabhataravind]
```
#### How I did it
#### How to verify it
#### Description for the changelog
… automatically (sonic-net#21584)

#### Why I did it
src/sonic-platform-common
```
* cb5564c - (HEAD -> master, origin/master, origin/HEAD) Create is_transceiver_vdm_supported API for CMIS transceivers (sonic-net#527) (11 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (sonic-net#21595)

#### Why I did it
src/sonic-gnmi
```
* 34b97ce - (HEAD -> master, origin/master, origin/HEAD) Add a Close method to DBusClient and use it in GNMI server (5 hours ago) [Dawei Huang]
* 0c6099f - add a testcase for outstanding channel. (12 hours ago) [Dawei Huang]
* 34c7a43 - initial commit. (2 days ago) [Dawei Huang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tomatically (sonic-net#21567)

#### Why I did it
src/sonic-linux-kernel
```
* ce810e3 - (HEAD -> master, origin/master, origin/HEAD) Integrate HW-MGMT 7.0040.2104 Changes (sonic-net#458) (10 days ago) [Dror Prital]
```
#### How I did it
#### How to verify it
#### Description for the changelog
sonic-net#21558)

- Why I did it
Mellanox SN5600, SN5640 SIMX platform does not support cpu thermal sensors

- How I did it
Update DEVICE_DATA configuration for the SIMX platform

- How to verify it
Check no error exists in syslog
- Why I did it
To have the latest sai.xml for Mellanox SN5640 SIMX platform

- How I did it
Update sai.xml for SN5640 SIMX platform

- How to verify it
Deploy an image on Mellanox SN5640 SIMX
- Why I did it
During smartswitch initialization, an error is observed during switch bootup. ztp disable runs decode-eeprom.
Happens during ztp because, ztp sets DEBUG="" here https://github.com/sonic-net/sonic-ztp/blob/202411/src/etc/default/ztp#L6

- How I did it
Fixed the import in inotify

- How to verify it
Verified by running decode-eeprom during init
- Why I did it
On nvidia-bluefield, there is a eMMC along with the default NVMe disk. However, the ssdhealth command today picks up eMMC by default. Thus added this new field to platforn.json

Related to sonic-net/sonic-utilities#3693

- How I did it
Infra to read this is updated in the sonic-utilities show cli

- How to verify it
Verfied if show platform ssdhealth is reading the correct disk by default
- Why I did it
To support new applications supported by QSFP-DD modules on Mellanox platforms.

- How I did it
Updated the media_settings.json file with the relevant applications data.

- How to verify it
Manual testing.
New leaf POLICER_ACTION
@shaygol shaygol force-pushed the policy_based_metering branch from f277b14 to a60b552 Compare February 27, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.