-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start/stop components in a more synchronised manner #2226
Conversation
🌐 Coverage report
|
pkg/component/runtime/manager.go
Outdated
// stop is async, wait for operation to finish, | ||
// otherwise new instance may be started and components | ||
// may fight for resources (e.g ports, files, locks) | ||
m.waitForStopped(existing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking here: can't we stop and wait for stopped services in parallel using goroutines and a WaitGroup? This would allow us to be a bit faster instead of shutting down one component at a time... Maybe this can be a further optimization down the line ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this should absolutely happen in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should handle the stopped in parallel, as @pchila stated.
This also really needs to have a test. We have many tests that cover the runtime manager and this being critical behavior this needs to be tested and covered.
Should be easy to subscribe to component state changes from the runtime and watch that the all components that should be stopped are stopped before even spawning the new components.
pkg/component/runtime/manager.go
Outdated
// stop is async, wait for operation to finish, | ||
// otherwise new instance may be started and components | ||
// may fight for resources (e.g ports, files, locks) | ||
m.waitForStopped(existing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this should absolutely happen in parallel.
@blakerouse added parallelism and test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the parallelism and the test!
@pchila @blakerouse where you able to test this new behaviour locally? @cmacknz @michalpristas any corner cases we might have missed? |
What does this PR do?
Two things being fixed here:
First one is order of start/stop.
Currently we start new components before stopping old. This is usually fine unless component is a service (start/stop is slower).
When new service is started old one can be still running and using resources needed for new service to start.
We can than see something like
bind port already in use
.Second thing fixed here is waiting for uninstall to finish.
Again problem is when we change output and we try to uninstall component
service-output-one
and startservice-output-two
We kick off uninstall of
service-output-one
and we actually don't wait for operation to finish. Then we startservice-output-two
This may be fine as
service-output-one
may already be stopped but uninstall may be still in progress and this uninstall will remove service even though it's needed forservice-output-two
Why is it important?
Service failures when
Checklist
./changelog/fragments
using the changelog tool