[202012] Fix issue: sometimes PFC WD unable to create zero buffer pool #2185

stephenxs · 2022-03-16T23:42:44Z

This is to backport PR #2164 to 202012 branch.

What I did

Fix issue: sometimes PFC WD is unable to create zero buffer pool.
On some platforms, an ingress/egress zero buffer profile will be applied on the PG and queue which are under PFC storm. The zero buffer profile is created based on zero buffer pool. However, sometimes it fails to create zero buffer pool due to too many buffer pools existing in the system.
Sometimes, there is a zero buffer pool existing on the system for reclaiming buffer. In that case, we can leverage it to create zero buffer profile for PFC WD.

Why I did it

Fix the issue via sharing the zero buffer pool between PFC WD and buffer orchagent

How I verified it

Manually test
Run PFC WD test and PFC WD warm reboot test
Run unit test

Details if related
The detailed flow is like this:
PFC Storm detected:

If there is a zero pool in PFC WD's cache, just create the zero buffer profile based on it
Otherwise, fetching the zero pool from buffer orchagent
- If got one, create the zero buffer profile based on it
- Otherwise,
  - create a zero buffer pool
  - notify the zero buffer pool about the buffer orch
- In both cases, PFC WD should notify buffer orch to increase the reference number of the zero buffer pool.

Buffer orchagent:

When creating the zero buffer pool,
- check whether there is one. if yes, skip the SAI API create_buffer_pool
- increase the reference number.
Before removing the zero buffer pool, decrease and check the reference number. if it is zero (after decreased), skip SAI API destroy_buffer_pool.
When PFC WD decrease reference number: remove the zero buffer pool if the reference number becomes zero

Notes
We do not leverage the object_reference_map infrastructure to track the dependency because:

it assumes the dependency will eventually be removed if an object is removed. that's NOT true in this scenario because the PFC storm can last for a relatively long time and even cross warm reboot.
the interfaces differ.

What I did

Why I did it

How I verified it

Details if related

**What I did** Fix issue: sometimes PFC WD is unable to create zero buffer pool. On some platforms, an ingress/egress zero buffer profile will be applied on the PG and queue which are under PFC storm. The zero buffer profile is created based on zero buffer pool. However, sometimes it fails to create zero buffer pool due to too many buffer pools existing in the system. Sometimes, there is a zero buffer pool existing on the system for reclaiming buffer. In that case, we can leverage it to create zero buffer profile for PFC WD. **Why I did it** Fix the issue via sharing the zero buffer pool between PFC WD and buffer orchagent **How I verified it** Manually test Run PFC WD test and PFC WD warm reboot test Run unit test **Details if related** ***The detailed flow is like this:*** PFC Storm detected: 1. If there is a zero pool in PFC WD's cache, just create the zero buffer profile based on it 2. Otherwise, fetching the zero pool from buffer orchagent - If got one, create the zero buffer profile based on it - Otherwise, - create a zero buffer pool - notify the zero buffer pool about the buffer orch - In both cases, PFC WD should notify buffer orch to increase the reference number of the zero buffer pool. Buffer orchagent: - When creating the zero buffer pool, - check whether there is one. if yes, skip the SAI API create_buffer_pool - increase the reference number. - Before removing the zero buffer pool, decrease and check the reference number. if it is zero (after decreased), skip SAI API destroy_buffer_pool. - When PFC WD decrease reference number: remove the zero buffer pool if the reference number becomes zero ***Notes*** We do not leverage the `object_reference_map` infrastructure to track the dependency because: - it assumes the dependency will eventually be removed if an object is removed. that's NOT true in this scenario because the PFC storm can last for a relatively long time and even cross warm reboot. - the interfaces differ.

stephenxs · 2022-03-16T23:44:31Z

This is to backport PR #2164 to 202012 branch.

liat-grozovik · 2022-03-21T13:06:17Z

@neethajohn LGTM. Can you please signoff?

liat-grozovik · 2022-03-27T06:29:15Z

@neethajohn kindly reminder to review and signoff

svsivm · 2022-06-07T08:25:50Z

Hi, While I understand the motivation behind the buffer pool changes, the commit message does not mention anything about why the zero buffer profile attribute values have been changed. For example, the SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE attribute value was set to SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC with the threshold SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH being programmed as '-8' for zero buffer profile. Why has this been changed now?

stephenxs requested a review from neethajohn March 16, 2022 23:42

stephenxs marked this pull request as draft March 16, 2022 23:43

stephenxs added the Bug 🐛 label Mar 16, 2022

stephenxs marked this pull request as ready for review March 20, 2022 23:33

stephenxs added the Included in 202012 Branch label Mar 20, 2022

keboliu removed the Included in 202012 Branch label Mar 21, 2022

neethajohn approved these changes Apr 2, 2022

View reviewed changes

liat-grozovik merged commit 13ccaba into sonic-net:202012 Apr 2, 2022

stephenxs deleted the fix-pfcwd-202012 branch April 2, 2022 07:32

stephenxs mentioned this pull request Apr 2, 2022

[202012] Update sonic-swss sonic-net/sonic-buildimage#10448

Closed

6 tasks

This was referenced Jun 9, 2022

Attribute values changed for Zero Buffer Profile #2319

Open

Don't query buffer profile attributes before APPLY_VIEW #2231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[202012] Fix issue: sometimes PFC WD unable to create zero buffer pool #2185

[202012] Fix issue: sometimes PFC WD unable to create zero buffer pool #2185

stephenxs commented Mar 16, 2022 •

edited by liat-grozovik

Loading

stephenxs commented Mar 16, 2022

liat-grozovik commented Mar 21, 2022

liat-grozovik commented Mar 27, 2022

svsivm commented Jun 7, 2022

[202012] Fix issue: sometimes PFC WD unable to create zero buffer pool #2185

[202012] Fix issue: sometimes PFC WD unable to create zero buffer pool #2185

Conversation

stephenxs commented Mar 16, 2022 • edited by liat-grozovik Loading

stephenxs commented Mar 16, 2022

liat-grozovik commented Mar 21, 2022

liat-grozovik commented Mar 27, 2022

svsivm commented Jun 7, 2022

stephenxs commented Mar 16, 2022 •

edited by liat-grozovik

Loading