Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that upgrade to 8.0 is going through 7.last #5258

Closed
pebrc opened this issue Jan 19, 2022 · 6 comments · Fixed by #5261
Closed

Validate that upgrade to 8.0 is going through 7.last #5258

pebrc opened this issue Jan 19, 2022 · 6 comments · Fixed by #5261
Assignees
Labels
>bug Something isn't working v2.0.0

Comments

@pebrc
Copy link
Collaborator

pebrc commented Jan 19, 2022

Upgrades to Elastic Stack 8.0 will only be possible from the last version of the 7.x series (~ 7.17). We are currently not enforcing or validating this behaviour. The lack of any validation can lead to very tricky situations for our users as "downgrades" are not allowed in ECK.

  1. user upgrades from a lower version of 7.x say 7.16.0 to 8.0
  2. 8.0 Pods keep boot looping due to bootstrap checks in Elasticsearch
  3. user tries to go back to 7.16.0
  4. ECK validation forbids downgrade in the webhook
  5. user disables webhook
  6. ECK validation in the reconciliation loop prevents rollout of downgrade as we compare currently deployed Pods with desired version for wire compatibility
  7. user can not successfully take any other action to my knowledge, other than recreating the cluster or manually deleting Pods.
@pebrc pebrc added the v2.0.0 label Jan 19, 2022
@botelastic botelastic bot added the triage label Jan 19, 2022
@pebrc pebrc added the >bug Something isn't working label Jan 19, 2022
@botelastic botelastic bot removed the triage label Jan 19, 2022
@pebrc pebrc self-assigned this Jan 19, 2022
@sebgl
Copy link
Contributor

sebgl commented Jan 19, 2022

Do we want to enforce that check at both the webhook level and the reconciliation level?
(I think yes, as #5259 would be the workaround for both cases, and we make things consistent whether you use the webhook or not).

@pebrc
Copy link
Collaborator Author

pebrc commented Jan 19, 2022

This already happens through the supportedVersions mechanism afaik:

func SupportedVersions(v version.Version) *version.MinMaxVersion {

The problem is of course that it is also not watertight: a quick upgrade from 7.16.0 through 7.17.0 to 8.0.0 without waiting for the rollout leaves the user in a similar situation as described above.

@sebgl
Copy link
Contributor

sebgl commented Jan 19, 2022

a quick upgrade from 7.16.0 through 7.17.0 to 8.0.0 without waiting for the rollout leaves the user in a similar situation as described above.

Could we also check the lowest running version currently running by inspecting the Pods? Only allow an upgrade to 8.0 if the lowest running version is 7.last? (which makes me think that actually with this approach we may ignore the version from the previous manifest version entirely, and only focus on retrieving a version from the current Pods, if there's any).

@pebrc
Copy link
Collaborator Author

pebrc commented Jan 19, 2022

Could we also check the lowest running version currently running by inspecting the Pods? Only allow an upgrade to 8.0 if the lowest running version is 7.last?

I was thinking whether we should just deny a version upgrade if the .status is still ApplyingChanges (this approach has it's own set of loopholes but I like the simplicity)

The version compatibility check we have already in place here

func (d *defaultDriver) verifySupportsExistingPods(pods []corev1.Pod) error {
for _, pod := range pods {
v, err := label.ExtractVersion(pod.Labels)
if err != nil {
return err
}
if err := d.SupportedVersions.WithinRange(v); err != nil {
return errors.Wrapf(err, "%s has incompatible version", pod.Name)
}
}

already looks at the version of each Pod and compares it with the proposed version.

@sebgl
Copy link
Contributor

sebgl commented Jan 19, 2022

I was thinking whether we should just deny a version upgrade if the .status is still ApplyingChanges (this approach has it's own set of loopholes but I like the simplicity)

or maybe look at status.version which I think is already matching the lowest running version? The status may indicate ApplyingChanges for various other reasons we may not want to account for?

@pebrc
Copy link
Collaborator Author

pebrc commented Jan 19, 2022

or maybe look at status.version which I think is already matching the lowest running version? The status may indicate ApplyingChanges for various other reasons we may not want to account for?

Yes that is better. ApplyingChanges is actually not updated until the first node has been rolled for some reason, something to look into as part of improving the status reporting I would say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working v2.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants