-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Buffer orch] Fix maximum headroom check failure in cold reboot #2948
Conversation
The accumulative headroom on a port is compared with the maximum headroom supported on the port whenever a buffer priority group is created/updated. This depends on the maximum headroom being exposed to the STATE_DB during orchagent initialization. However, in the cold reboot, orchagent starts slow which prevents the threshold from being exposed on time. In this case, the buffer manager is not able to perform the headroom check and the buffer orchagent should handle the possible failure from SAI in case the accumulative headroom exceeds the threshold. Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs , @dgsudharsan , if it failed, how does it recover? |
Hi
This is caused by wrong user configuration.
User should correct the configuration if it fails
获取 Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
发件人: Prince Sunny ***@***.***>
发送时间: 星期三, 十一月 8, 2023 8:25 上午
收件人: sonic-net/sonic-swss ***@***.***>
抄送: Stephen Sun ***@***.***>; Mention ***@***.***>
主题: Re: [sonic-net/sonic-swss] [Buffer orch] Fix maximum headroom check failure in cold reboot (PR #2948)
@stephenxs<https://github.com/stephenxs> , @dgsudharsan<https://github.com/dgsudharsan> , if it failed, how does it recover?
―
Reply to this email directly, view it on GitHub<#2948 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABJBIZEFAG37E3ABYS2OKIDYDLGPJAVCNFSM6AAAAAA6VJQQJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBQG4ZDCMZQG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
It doesn't align with this description "The accumulative headroom on a port is compared with the maximum headroom supported on the port whenever a buffer priority group is created/updated in the dynamic buffer model which depends on the maximum headroom being exposed to the STATE_DB during orchagent initialization." |
Hi
In the 2nd paragraph I mentioned the exception.
To avoid the confusion I will rephrase it as "The accumulative headroom on a port should be compared with the maximum headroom supported on the port whenever a buffer priority group is created/updated in the dynamic buffer model which depends on the maximum headroom being exposed to the STATE_DB during orchagent initialization."
获取 Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
发件人: Prince Sunny ***@***.***>
发送时间: 星期三, 十一月 8, 2023 9:34 上午
收件人: sonic-net/sonic-swss ***@***.***>
抄送: Stephen Sun ***@***.***>; Mention ***@***.***>
主题: Re: [sonic-net/sonic-swss] [Buffer orch] Fix maximum headroom check failure in cold reboot (PR #2948)
Hi This is caused by wrong user configuration. User should correct the configuration if it fails
It doesn't align with this description "The accumulative headroom on a port is compared with the maximum headroom supported on the port whenever a buffer priority group is created/updated in the dynamic buffer model which depends on the maximum headroom being exposed to the STATE_DB during orchagent initialization."
―
Reply to this email directly, view it on GitHub<#2948 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABJBIZDEUG6W22L6SPBAX63YDLORNAVCNFSM6AAAAAA6VJQQJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBQHA3DCNJSGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@prsunny any further comments? |
From what I understand, it is not a common scenario (like manually editing a config_db.json) to simulate this failure. So this handling, imo is unnecessary. |
Hi @prsunny |
Verified on top of 202305 commit sonic-net/sonic-buildimage@a3f8153 |
The accumulative headroom on a port is compared with the maximum headroom supported on the port whenever a buffer priority group is created/updated. This depends on the maximum headroom being exposed to the STATE_DB during orchagent initialization. However, in the cold reboot, orchagent starts slow which prevents the threshold from being exposed on time. In this case, the buffer manager is not able to perform the headroom check and the buffer orchagent should handle the possible failure from SAI in case the accumulative headroom exceeds the threshold.
The accumulative headroom on a port is compared with the maximum headroom supported on the port whenever a buffer priority group is created/updated. This depends on the maximum headroom being exposed to the STATE_DB during orchagent initialization. However, in the cold reboot, orchagent starts slow which prevents the threshold from being exposed on time. In this case, the buffer manager is not able to perform the headroom check and the buffer orchagent should handle the possible failure from SAI in case the accumulative headroom exceeds the threshold.
Cherry-pick PR to 202205: #3001 |
@stephenxs cherry pick PR didn't pass PR checker. Please check!!! |
2 similar comments
@stephenxs cherry pick PR didn't pass PR checker. Please check!!! |
@stephenxs cherry pick PR didn't pass PR checker. Please check!!! |
It no longer needs to cherry-pick to 202205. Just removed the flag. |
What I did
Fix maximum headroom check failure in the cold reboot.
The accumulative headroom on a port should be compared with the maximum headroom supported on the port whenever a buffer priority group is created/updated in the dynamic buffer model which depends on the maximum headroom being exposed to the STATE_DB during orchagent initialization.
However, in the cold reboot, orchagent starts slow which prevents the threshold from being exposed on time. In this case, the buffer manager is not able to perform the headroom check and the buffer orchagent should handle the possible failure from SAI in case the accumulative headroom exceeds the threshold.
Signed-off-by: Stephen Sun stephens@nvidia.com
Why I did it
To fix the issue.
How I verified it
Manually/regression test.
Details if related