You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.
The iptables implementation tries to add a rule when KIAM-Agent daemonset is started, if that fails (due to lock, or other issue), the container restarts, the command is retried and usually this is not a problem.
But in the case of shutdown, this can be problematic because the rule might get left behind after the daemonset is deleted.
In our use case, we have implemented a solution for "blue/green" of sort for KIAM in order to avoid the downtime when we rotate cert, and we do this by deploying a second full stack of KIAM (new certs, new agents, new server, new service), and then delete the old stack of KIAM which removes it's rule and cuts over the traffic to the new stack -> cert rotated and 0 downtime.
However this gets problematic in cases where the IPTable rule is not properly removed by the deferred call (either a fatal exit, or iptables lock fails).
We should evaluate either having retries around iptables.Remove() or possibly use the binary's --wait / --wait-interval if the iptables-go pkg allows it, which will let the underlying iptables binary do the waiting for the lock.
Also we should make sure in cases of fatal exit we also remove the iptables rule - Ref PR: #300
Will it be possible to address this? I can send a PR if you agree with this.
The text was updated successfully, but these errors were encountered:
@pingles@rhysemmas
Do you have any thoughts on this? I think a simple solution is to add a some retries around this code instead of a best-effort defer.
The iptables implementation tries to add a rule when KIAM-Agent daemonset is started, if that fails (due to lock, or other issue), the container restarts, the command is retried and usually this is not a problem.
But in the case of shutdown, this can be problematic because the rule might get left behind after the daemonset is deleted.
In our use case, we have implemented a solution for "blue/green" of sort for KIAM in order to avoid the downtime when we rotate cert, and we do this by deploying a second full stack of KIAM (new certs, new agents, new server, new service), and then delete the old stack of KIAM which removes it's rule and cuts over the traffic to the new stack -> cert rotated and 0 downtime.
However this gets problematic in cases where the IPTable rule is not properly removed by the deferred call (either a fatal exit, or iptables lock fails).
We should evaluate either having retries around
iptables.Remove()
or possibly use the binary's--wait
/--wait-interval
if theiptables-go
pkg allows it, which will let the underlying iptables binary do the waiting for the lock.Also we should make sure in cases of fatal exit we also remove the iptables rule - Ref PR: #300
Will it be possible to address this? I can send a PR if you agree with this.
The text was updated successfully, but these errors were encountered: