-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Fix Fleet Setup to handle concurrent calls across nodes in HA Kibana deployment #118423
Comments
Pinging @elastic/fleet (Team:Fleet) |
Dec 1: moved to description |
This could be another option instead of the idempotent route. We could have an SO that is only used for this purpose. My concern is that this type of approach isn't guaranteed to always work, though if we need a quick solution it's better than what we have today. Where this approach doesn't work:
This results in both Node A and B running setup operations simultaneously. |
@kpollich FYI I think we can / should backport all of these fixes to 7.16.1 if they're clean backports. Should help customers in those cases too where we're already seeing this issue come up. |
@joshdover Everything we initially tracked here has been addressed. Do we want to start thinking about how to test this in a HA environment? |
Good thinking. I'm thinking we should try running several instances of
However, I know we've had some challenges with the Jest integration tests in #118797. I can't remember if the issues we were encountering there would affect this test as well or not. Any other ideas? |
I think that approach is definitely sound, but as I recall we weren't able to actually get Kibana to boot up when running Jest integration tests against the x-pack directory. I probably won't have time this week to actually start addressing this, but I do think documenting that approach is a good start. |
@kpollich Let me give a it a try later this week and see if I can't get something working. |
Currently, Fleet tracks the status of the
setup
process in memory.Because of this, it's possible that multiple concurrent calls to
setup
can occur in environments with multiple Kibana instances. This introduces multiple issues such asIn #111858 (comment), we discussed how we could solve this problem by making all of the setup operations idempotent. Here's the summary of what we concluded in that thread:
default
.default_policy
anddefault_fleet_server_policy
for those two policies since they do not require an ID in preconfiguration..fleet-enrollment-api-keys
uses uuidv4 but creating duplicates should not be a problem.Original description, see why this may not work in all cases here
Our architecture today looks like this:
We should consider our options for moving the status of Fleet's
setup
process into some kind of persistent state in order to avoid these issues. It's likely that these issues have been exacerbated by #111858, in which we've moved Fleet'ssetup
process to Kibana boot.If we store the status of Fleet setup in Elasticsearch, we'd have an architecture more like this:
It would be helpful to stand up an environment with multiple Kibana instances and report findings on boot in this issue to further crystalize these issues.
The text was updated successfully, but these errors were encountered: