Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic buffer calc] Bug fix: Remove PGs from an administratively down port. #1652

Merged
merged 6 commits into from
Mar 18, 2021

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Feb 25, 2021

What I did
Bug fixes: Remove PGs from an administratively down port.

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it
To fix bugs

How I verified it
Run regression and vs test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012

Details if related

  • Remove PGs from an administratively down port.
    • Introduce a new state: PORT_ADMIN_DOWN which represents the port is administratively down.
    • Remove all PGs when the port is shut down and re-add all configured PGs when port is started up
    • Only record the new value but don't touch BUFFER_PG_TABLE if the following events come when a port is administratively down
      • a port's MTU, speed, or cable length is updated
      • a new PG is added to a port or an existing PG is removed from a port
    • Optimize the port event handling flow since refreshPriorityGroupsForPort should be called only once in case more than one fields are updated
    • Optimize the Lua plugin which calculates the buffer pool size accordingly

Bug:
The buffermgrd can keep adding suffix to the buffer pool reference if
the buffer pool isn't found when it is being referenced. In most of the
cases, it's caused by wrong configuration.
Cause:
In handleBufferProfileTable, the value of each field is designated by
a (lvalue) reference to the object in the tuple, which means the object
in the tuple will be changed if the value is changed in the function.
Fix:
Pass the value of each field by value instead of reference.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs closed this Feb 25, 2021
@stephenxs stephenxs reopened this Feb 26, 2021
@stephenxs stephenxs marked this pull request as draft February 26, 2021 23:33
@stephenxs stephenxs changed the title [Dynamic buffer calc] Fix bugs: Remove PGs when port is admin down [Dynamic buffer calc] Fix bugs Feb 28, 2021
Introduce a new state: `PORT_ADMIN_DOWN` which represents the port is administratively down.
Remove all PGs when the port is shut down and re-add all configured PGs when the port is started up
Only record the new value but don't touch `BUFFER_PG_TABLE` if the following events come when a port is administratively down
 - a port's MTU, speed, or cable length is updated
 - a new PG is added to a port or an existing PG is removed from a port
Optimize the port event handling flow since `refreshPriorityGroupsForPort` should be called only once in case more than one fields are updated
Optimize the Lua plugin which calculates the buffer pool size accordingly

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs force-pushed the remove-pg-admin-down branch from f5bb63c to 6566369 Compare March 1, 2021 05:25
@stephenxs stephenxs marked this pull request as ready for review March 1, 2021 05:26
@stephenxs stephenxs changed the title [Dynamic buffer calc] Fix bugs [Dynamic buffer calc] Bug fix: Remove PGs from an administratively down port. Mar 1, 2021
Identify the case that the referenced profile doesn't exist and
exit the plugin gracefully.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@liat-grozovik
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

VS test failed on the following cases. Can anyone help to check them?

test_route_nhg failed (1 runs remaining out of 2).
	<class 'AttributeError'>
	'DockerVirtualSwitch' object has no attribute 'appldb'
	[<TracebackEntry /home/vsts/work/1/s/tests/conftest.py:1546>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:386>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:429>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:393>]
test_route_nhg failed; it passed 0 out of the required 1 times.
	<class 'AttributeError'>
	'DockerVirtualSwitch' object has no attribute 'appldb'
	[<TracebackEntry /usr/lib/python3/dist-packages/six.py:702>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:1546>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:386>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:429>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:393>]
test_route_nhg_exhaust failed (1 runs remaining out of 2).
	<class 'AttributeError'>
	'DockerVirtualSwitch' object has no attribute 'appldb'
	[<TracebackEntry /usr/lib/python3/dist-packages/six.py:702>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:1546>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:386>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:429>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:393>]
test_route_nhg_exhaust failed; it passed 0 out of the required 1 times.
	<class 'AttributeError'>
	'DockerVirtualSwitch' object has no attribute 'appldb'
	[<TracebackEntry /usr/lib/python3/dist-packages/six.py:702>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:1546>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:386>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:429>, <TracebackEntry /home/vsts/work/1/s/tests/conftest.py:393>]

test_VlanAddRemove failed (1 runs remaining out of 2).
	<class 'AssertionError'>
	
	[<TracebackEntry /home/vsts/work/1/s/tests/test_vlan.py:34>, <TracebackEntry /home/vsts/work/1/s/tests/dvslib/dvs_vlan.py:61>]

@liat-grozovik
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dprital
Copy link
Collaborator

dprital commented Mar 8, 2021

@neethajohn - Can you please approve and merge ? it is also required for 202012. Thanks,

@dprital
Copy link
Collaborator

dprital commented Mar 10, 2021

@neethajohn - Kindly reminder. Thanks.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs requested a review from liat-grozovik March 11, 2021 11:31
liat-grozovik
liat-grozovik previously approved these changes Mar 14, 2021
@stephenxs
Copy link
Collaborator Author

/AzurePipelines run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1652 in repo Azure/sonic-swss

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@neethajohn neethajohn merged commit 908e0c6 into sonic-net:master Mar 18, 2021
@stephenxs stephenxs deleted the remove-pg-admin-down branch March 19, 2021 00:09
@dprital
Copy link
Collaborator

dprital commented Mar 19, 2021

@daall - Can you please merge to 202012 ? Thanks.

@stephenxs
Copy link
Collaborator Author

We are waiting for #1630 to be merged first.

daall pushed a commit that referenced this pull request Mar 19, 2021
…wn port. (#1652)

Remove PGs from an administratively down port.
- Introduce a new state: PORT_ADMIN_DOWN which represents the port is administratively down.
- Remove all PGs when the port is shut down and re-add all configured PGs when port is started up
- Only record the new value but don't touch BUFFER_PG_TABLE if the following events come when a port is administratively down, a port's MTU, speed, or cable length is updated, a new PG is added to a port or an existing PG is removed from a port
- Optimize the port event handling flow since refreshPriorityGroupsForPort should be called only once in case more than one fields are updated
- Optimize the Lua plugin which calculates the buffer pool size according

Signed-off-by: Stephen Sun stephens@nvidia.com

How I verified it
Run regression and vs test
@lguohan
Copy link
Contributor

lguohan commented Mar 19, 2021

how come this bug fix is so large. is this really a bug fix? @daall , let's not merge this into 202012 for now.

@neethajohn
Copy link
Contributor

@stephenxs , I am reverting this merge. Please break this PR into multiple PRs of smaller commits - optimizations, port down handling, renaming func/log messages etc

@stephenxs
Copy link
Collaborator Author

stephenxs commented Mar 20, 2021

@stephenxs , I am reverting this merge. Please break this PR into multiple PRs of smaller commits - optimizations, port down handling, renaming func/log messages etc

OK. Will do.

neethajohn added a commit that referenced this pull request Mar 20, 2021
daall pushed a commit that referenced this pull request Mar 22, 2021
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
…wn port. (sonic-net#1652)

Remove PGs from an administratively down port.
- Introduce a new state: PORT_ADMIN_DOWN which represents the port is administratively down.
- Remove all PGs when the port is shut down and re-add all configured PGs when port is started up
- Only record the new value but don't touch BUFFER_PG_TABLE if the following events come when a port is administratively down, a port's MTU, speed, or cable length is updated, a new PG is added to a port or an existing PG is removed from a port
- Optimize the port event handling flow since refreshPriorityGroupsForPort should be called only once in case more than one fields are updated
- Optimize the Lua plugin which calculates the buffer pool size according

Signed-off-by: Stephen Sun stephens@nvidia.com

How I verified it
Run regression and vs test
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants