construct a single PanDomainAuthSettingsRefresher per app instance #27108
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the value of this and can you measure success?
Fewer (hopefully no) unhealthy preview instance replacements.
What does this change?
We're seeing frequent healthcheck failures and unhealthy instances from preview since #27012
There haven't been conclusive error logs, but since that was merged there has been a substantial increase of "request timeout" and "connection refused" logs, which indicate some sort of resource exhaustion. Going through that PR I noticed that the construction of a
PanDomainAuthSettingsRefresher
is declared as adef
(ie. function) rather than aval
- which means every time the settings is referenced a new instance is constructed. Each instance on construction schedules updates to itself on a new thread pool, which means this quickly mounts up and we'll have lots of schedulers attempting to update their own instance of the refresher, clogging up memory, threads and sockets.Instead declare as
val
to ensure the refresher is only constructed once per EC2 instance.Checklist