-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[quagga bgp] set quagga graceful restart timeout to 180 seconds #2362
Conversation
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
@yxieca Can you please also adjust the default timer we are using at fpmsyncd level? Both timers should ideally match, at least till we have a synchronization mechanism between bgpd and fpmsyncd. |
@rodnymolina Since this only change quagga default timer, it might not be exactly right to change the fpmsyncd timer alone. Or we can have timer configuration in the config-db per platform. |
@rodnymolina @zhenggen-xu This is changing the timer for the peer. That's completely unrelated to the local fpmsyncd timer. The two timers must not have a requirement to match nor should they be compared against each other. The synchronization you are referring to has no bearing or influence to the peer restart timer the local system has sent. |
@nikos-github I think you are right, the two timers are not strongly related. I was thinking about FRR/Quagga consistence. The fpmsyncd timer was more for the convergence time not bgp down time. That should be tuned based on worst case of the routes learnt etc not necessarily the grace restart timer. |
@nikos-github i don't agree with that. As you know, this timer is used as an estimation of the amount of time required (by each node) to re-establish the sessions with its peers. In this case, we are increasing this value to 180 secs, so this is the time that it may take (in the worst case scenario) for the restarting router to re-learn state back from helper nodes. In this case we don't want fpmsyncd reconciliation process to kick off before we have had a chance to receive all the pending state. This artificial correlation between bgp-gr timer and fpmsyncd-timer is only needed today coz, at fpmsyncd level, we don't have a deterministic way to identify when the "re-learning" phase has concluded. Once we have this missing glue i agree that both timers can run independently. @zhenggen-xu Not sure i fully got your point, but looks like having separated per-platform/per-routing-stack values won't help in this case, as there seems to be a system fast-reboot limitation that is forcing us to increase this timer, and that will impact FRR in the same way that it affects Quagga. And yes, i agree that we will also need to change FRR values to be fully consistent. |
@rodnymolina The artificial correlation you are making between the timers is not correct irrespective of a signal or not for EoR. We can discuss offline. |
@rodnymolina , I feel the two timers are not related. the gr timer is between the bgp shutdown and bgp session setup after reboot, the local timer starts after the bgp session setup after reboot. |
@lguohan I agree/understand that both timers can potentially measure different things, but in the absence of a mechanism to sync-up bgp and fpmsyncd (through a EoR/EOIU message), i feel it's a good idea to have both timers being more and less in-sync. My point is specially valid for typical warm-reboot use-cases (daemon/docker restart), as in these scenarios both bgp-gr-timer and fpmsyncd-timer are going to be measuring similar things. On the other hand, i understand that this correlation is much weaker on the system-warm-reboot case, as both timers can/will diverge. Question is, which case should we optimize for? I feel that warm-restart scenarios (daemon/docker restart) are much more frequent than system-warm-reboot ones, so i'd rather cover the first scenario as best as we can. And perhaps we could go even further: forget about the system-warm-reboot case altogether, so that we can set more reasonable bgp-gr timers (~30 secs) and reduce the suboptimal-routing window. If we are interesting in optimizing for the warm-restart case, bgp-gr-timer and fpmsyncd-timer values would need to be (more and less) in-sync. |
- What I did
set quagga graceful restart timeout to 180 seconds
We need graceful restart timeout of 180 for warm reboot.