iptables - should have retries in cases where it fails to update #396

eytan-avisror · 2020-04-23T23:28:40Z

The iptables implementation tries to add a rule when KIAM-Agent daemonset is started, if that fails (due to lock, or other issue), the container restarts, the command is retried and usually this is not a problem.

But in the case of shutdown, this can be problematic because the rule might get left behind after the daemonset is deleted.

In our use case, we have implemented a solution for "blue/green" of sort for KIAM in order to avoid the downtime when we rotate cert, and we do this by deploying a second full stack of KIAM (new certs, new agents, new server, new service), and then delete the old stack of KIAM which removes it's rule and cuts over the traffic to the new stack -> cert rotated and 0 downtime.

However this gets problematic in cases where the IPTable rule is not properly removed by the deferred call (either a fatal exit, or iptables lock fails).

We should evaluate either having retries around iptables.Remove() or possibly use the binary's --wait / --wait-interval if the iptables-go pkg allows it, which will let the underlying iptables binary do the waiting for the lock.

Also we should make sure in cases of fatal exit we also remove the iptables rule - Ref PR: #300

Will it be possible to address this? I can send a PR if you agree with this.

The text was updated successfully, but these errors were encountered:

eytan-avisror · 2020-05-13T21:22:34Z

@pingles @rhysemmas
Do you have any thoughts on this? I think a simple solution is to add a some retries around this code instead of a best-effort defer.

kiam/cmd/kiam/iptables.go

Lines 62 to 68 in 6e07571

    
           func (r *rules) Remove() error { 
        
           	ipt, err := iptables.New() 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	return ipt.Delete("nat", "PREROUTING", r.ruleSpec()...)

Using the iptables --wait option might not be a good idea because the iptables command can still fail for retryable reasons other than locking.

eytan-avisror mentioned this issue May 13, 2020

add retries around iptable rule removal #402

Merged

Joseph-Irving closed this as completed in #402 May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iptables - should have retries in cases where it fails to update #396

iptables - should have retries in cases where it fails to update #396

eytan-avisror commented Apr 23, 2020 •

edited

Loading

eytan-avisror commented May 13, 2020 •

edited

Loading

iptables - should have retries in cases where it fails to update #396

iptables - should have retries in cases where it fails to update #396

Comments

eytan-avisror commented Apr 23, 2020 • edited Loading

eytan-avisror commented May 13, 2020 • edited Loading

eytan-avisror commented Apr 23, 2020 •

edited

Loading

eytan-avisror commented May 13, 2020 •

edited

Loading