Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

iptables - should have retries in cases where it fails to update #396

Closed
eytan-avisror opened this issue Apr 23, 2020 · 1 comment · Fixed by #402
Closed

iptables - should have retries in cases where it fails to update #396

eytan-avisror opened this issue Apr 23, 2020 · 1 comment · Fixed by #402

Comments

@eytan-avisror
Copy link
Contributor

eytan-avisror commented Apr 23, 2020

The iptables implementation tries to add a rule when KIAM-Agent daemonset is started, if that fails (due to lock, or other issue), the container restarts, the command is retried and usually this is not a problem.

But in the case of shutdown, this can be problematic because the rule might get left behind after the daemonset is deleted.

In our use case, we have implemented a solution for "blue/green" of sort for KIAM in order to avoid the downtime when we rotate cert, and we do this by deploying a second full stack of KIAM (new certs, new agents, new server, new service), and then delete the old stack of KIAM which removes it's rule and cuts over the traffic to the new stack -> cert rotated and 0 downtime.

However this gets problematic in cases where the IPTable rule is not properly removed by the deferred call (either a fatal exit, or iptables lock fails).

We should evaluate either having retries around iptables.Remove() or possibly use the binary's --wait / --wait-interval if the iptables-go pkg allows it, which will let the underlying iptables binary do the waiting for the lock.

Also we should make sure in cases of fatal exit we also remove the iptables rule - Ref PR: #300

Will it be possible to address this? I can send a PR if you agree with this.

@eytan-avisror
Copy link
Contributor Author

eytan-avisror commented May 13, 2020

@pingles @rhysemmas
Do you have any thoughts on this? I think a simple solution is to add a some retries around this code instead of a best-effort defer.

func (r *rules) Remove() error {
ipt, err := iptables.New()
if err != nil {
return err
}
return ipt.Delete("nat", "PREROUTING", r.ruleSpec()...)

Using the iptables --wait option might not be a good idea because the iptables command can still fail for retryable reasons other than locking.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant