-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix issue when webhook server refreshing cert #260
Conversation
/cc @anfernee |
@mengqiy: GitHub didn't allow me to request PR reviews from the following users: anfernee. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
5b51c4f
to
e05117f
Compare
/lgtm it doesn't fix #191 tho. |
@anfernee: changing LGTM is restricted to assignees, and only kubernetes-sigs/controller-runtime repo collaborators may be assigned issues. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good. will wait for the tests before lgtm.
e05117f
to
5c464d1
Compare
/hold |
5c464d1
to
6c8993d
Compare
Made some additional changes in pkg/webhook/server.go and added tests. |
@@ -0,0 +1,148 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test case to test the case where the server exceeds cert refresh interval? you can change defaultCertRefreshInterval
to a smaller interview. I am worried that if you start a server on the same port immediately after shutdown, you will have port conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC
https://github.com/kubernetes-sigs/controller-runtime/pull/260/files#diff-8b0412ec4fd52af1419bd0fcd0f2e101R121 is doing what you ask. The server restarts multiple times and rotates the certs.
go serveFn() | ||
case <-stop: | ||
return nil | ||
case e := <-errCh: | ||
return e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally if you start server here, it should just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO if there is an unexpected error, we should surface the error instead of keeping retrying.
continue | ||
} | ||
log.Info("server is shutting down to reload the certificates.") | ||
err = srv.Shutdown(context.Background()) | ||
shutdownHappend = true | ||
err = s.httpServer.Shutdown(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is Shutdown
is async, meaning return of shutdown doesn't necessarily mean the server is already done. But return of ListenAndServeTLS
guarantees it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shutdown gracefully shuts down the server without interrupting any active connections. Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down
Per https://golang.org/pkg/net/http/#Server.Shutdown, it is graceful shutdown and synchronized call. Did I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused of those 2 methods, you are right. pls ignore it.
if err != nil { | ||
log.Error(err, "encountering error when shutting down") | ||
return err | ||
} | ||
timer = time.Tick(wait.Jitter(defaultCertRefreshInterval, 0.1)) | ||
go serveFn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might have port conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question here is that shutting down the server will unbind the port, but if it will complete in time before the server tries to bind the same port next time.
I have been searching online for this question for quite a while. But I'm still not sure what the 100% correct thing to do here :/
It seems when the Listener is closed, it should have unbinded the port.
Some tests I added in this PR rotates the cert and reloads the server for multiple times. I haven't seen any issue about port conflict.
I'm open to suggestions if you have a better solution. If not, I will probably merge it as is :)
43a3f31
to
acd82f6
Compare
} | ||
|
||
cg := &generator.SelfSignedCertGenerator{} | ||
s.CertDir, err = ioutil.TempDir("/tmp", "controller-runtime-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we doing cleanup of this temp dir ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Added code to cleanup the temp dir.
https://github.com/kubernetes-sigs/controller-runtime/pull/260/files#diff-8b0412ec4fd52af1419bd0fcd0f2e101R159
acd82f6
to
09998a3
Compare
PTAL |
/lgtm |
@anfernee: changing LGTM is restricted to assignees, and only kubernetes-sigs/controller-runtime repo collaborators may be assigned issues. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: mengqiy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fixes #191
fixes #192
Tests are on the way.