-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Elasticsearch volumes expansion #3752
Conversation
We'll reintroduce it later, in a form that allows modifying PVC storage reqs.
...when setting the initial replicas values for statefulset re-creation. Otherwise, if we're unlucky and one Pod is missing for some reasons (eg. pod-0), using the Pod count could lead to removing the Pods with highest ordinals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @sebgl. I did some testing and left some comments, nothing major.
There's a concern that some users may not be allowed to deploy ECK if it needs read access to storage classes (cluster-wide) resource. |
Jenkins test this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still doing some tests but overall it looks good 👍
Also I just want to be sure we don't want to handle the case where the Elasticsearch resource is deleted before the statefulset is re-created, leaving some Pods orphaned ?
…ated If the Elasticsearch resource is removed while the StatefulSet is being recreated, orphan Pods will stay around forever. To avoid that situation, we can set append a temporary owner ref to the Pods before deleting their StatefulSet, so Pods don't stay orphans if Elasticsearch is deleted. We can then remove the temporary owner ref that was set after the StatefulSet is recreated.
@barkbay thanks for pointing that particular case again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments re. the unit tests, I think they are not complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Great work !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked well through all my attempts to break it :) nice work. Just one scenario I'm concerned about, the rest are nits.
if err != nil { | ||
return results.WithError(fmt.Errorf("StatefulSet recreation: %w", err)) | ||
} | ||
if recreations > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the spec provided in ES resource results in invalid statefulset? If yes, then I think, since we use the statefulset from the spec when annotating, we won't make progress past this point because the operator will keep trying to recreate the statefulset from annotation and keep failing.
If the above is valid then I'd be a little uncomfortable, because we could effectively get the operator stuck on wrong config. I think that would be first such case (e.g. changing resource won't work, because we process annotation first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we set in the annotation always matches an "actual StatefulSet", ie. a StatefulSet that exists in the apiserver, for which an update (or create) operation was successful in the past. We don't set the annotation based on the "expected StatefulSet".
I think in the case you mention where the StatefulSet cannot possibly be created, the update call would probably have failed beforehand? I don't know if there are some cases where a spec would lead to a successful update but an invalid creation. It seems weird.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was convinced that we set the expected sset in the annotation, which I see now isn't true. Yes, I think we can dismiss the case of successful update and failed create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
This commit modifies the existing documentation in order to account for the volume expansion feature introduced in elastic#3752. With those changes, it is now possible to resize storage requests if the storage size allows it. Every other change is forbidden. The doc also mentions how Pods may need to be restarted if the volume driver does not support online expansion. I added lower-level titles to better structure the volume claim templates page.
* Document volume expansion and volume claim immutability This commit modifies the existing documentation in order to account for the volume expansion feature introduced in #3752. With those changes, it is now possible to resize storage requests if the storage size allows it. Every other change is forbidden. The doc also mentions how Pods may need to be restarted if the volume driver does not support online expansion. I added lower-level titles to better structure the volume claim templates page. * Improvements from PR review
This commit adds support for expanding Logstash volumes by editing the storage requests in spec.VolumeClaimTemplate, based on the existing implenentation in Elasticsearch implemented in elastic#3752. The same constraints hold - * Volume size can only be increased, not decreased * Storage class must specify allowVolumeExpansion: true * Filesystem resize without pod recreation is only possible if the storage driver allows it This works in the same way as the Elasticsearch implementation An update of the storage request in the volumeClaimTemplate will * Update the storage requests spec of all existing PVCs: they are immediately resized by the storage driver, if inline expansion is supported. Otherwise Pods need to be recreated. * Delete the StatefulSet, but not the pod that it owns, storing recreation details in an annotation on the owning Logstash resource * Recreate the StatefulSet with the new volumeClaimTemplate spec * Remove the recreation annotation from the Logstash resource This PR moves some of the PVC-expansion code from Elasticsearch into a common area to avoid code duplication
This commit adds support for expanding Logstash volumes by editing the storage requests in spec.VolumeClaimTemplate, based on the existing implenentation in Elasticsearch implemented in elastic#3752. The same constraints hold - * Volume size can only be increased, not decreased * Storage class must specify allowVolumeExpansion: true * Filesystem resize without pod recreation is only possible if the storage driver allows it This works in the same way as the Elasticsearch implementation An update of the storage request in the volumeClaimTemplate will * Update the storage requests spec of all existing PVCs: they are immediately resized by the storage driver, if inline expansion is supported. Otherwise Pods need to be recreated. * Delete the StatefulSet, but not the pod that it owns, storing recreation details in an annotation on the owning Logstash resource * Recreate the StatefulSet with the new volumeClaimTemplate spec * Remove the recreation annotation from the Logstash resource This PR moves some of the PVC-expansion code from Elasticsearch into a common area to avoid code duplication
This commit adds support for expanding Logstash volumes by editing the storage requests in spec.VolumeClaimTemplate, based on the existing implenentation in Elasticsearch implemented in elastic#3752. The same constraints hold - * Volume size can only be increased, not decreased * Storage class must specify allowVolumeExpansion: true * Filesystem resize without pod recreation is only possible if the storage driver allows it This works in the same way as the Elasticsearch implementation An update of the storage request in the volumeClaimTemplate will * Update the storage requests spec of all existing PVCs: they are immediately resized by the storage driver, if inline expansion is supported. Otherwise Pods need to be recreated. * Delete the StatefulSet, but not the pod that it owns, storing recreation details in an annotation on the owning Logstash resource * Recreate the StatefulSet with the new volumeClaimTemplate spec * Remove the recreation annotation from the Logstash resource This PR moves some of the PVC-expansion code from Elasticsearch into a common area to avoid code duplication
This commit adds support for expanding Logstash volumes by editing the storage requests in spec.VolumeClaimTemplate, based on the existing implenentation in Elasticsearch implemented in elastic#3752. The same constraints hold - * Volume size can only be increased, not decreased * Storage class must specify allowVolumeExpansion: true * Filesystem resize without pod recreation is only possible if the storage driver allows it This is based on the Elasticsearch implementation An update of the storage request in the volumeClaimTemplate will * Update the storage requests spec of all existing PVCs: they are immediately resized by the storage driver, if inline expansion is supported. Otherwise Pods need to be recreated. * Delete the StatefulSet, but not the pod that it owns, storing recreation details in an annotation on the owning Logstash resource * Recreate the StatefulSet with the new volumeClaimTemplate spec * Remove the recreation annotation from the Logstash resource
This commit adds support for expanding Logstash volumes by editing the storage requests in spec.VolumeClaimTemplate, based on the existing implenentation in Elasticsearch implemented in elastic#3752. The same constraints hold - * Volume size can only be increased, not decreased * Storage class must specify allowVolumeExpansion: true * Filesystem resize without pod recreation is only possible if the storage driver allows it This is based on the Elasticsearch implementation An update of the storage request in the volumeClaimTemplate will * Update the storage requests spec of all existing PVCs: they are immediately resized by the storage driver, if inline expansion is supported. Otherwise Pods need to be recreated. * Delete the StatefulSet, but not the pod that it owns, storing recreation details in an annotation on the owning Logstash resource * Recreate the StatefulSet with the new volumeClaimTemplate spec * Remove the recreation annotation from the Logstash resource
This commit adds support for expanding Logstash volumes by editing the storage requests in spec.VolumeClaimTemplate, based on the existing implenentation in Elasticsearch implemented in elastic#3752. The same constraints hold - * Volume size can only be increased, not decreased * Storage class must specify allowVolumeExpansion: true * Filesystem resize without pod recreation is only possible if the storage driver allows it This is based on the Elasticsearch implementation An update of the storage request in the volumeClaimTemplate will * Update the storage requests spec of all existing PVCs: they are immediately resized by the storage driver, if inline expansion is supported. Otherwise Pods need to be recreated. * Delete the StatefulSet, but not the pod that it owns, storing recreation details in an annotation on the owning Logstash resource * Recreate the StatefulSet with the new volumeClaimTemplate spec * Remove the recreation annotation from the Logstash resource
This adds support for expanding Logstash volumes by editing the storage requests in spec.VolumeClaimTemplate, based on the existing implenentation in Elasticsearch implemented in #3752. The same constraints hold - * Volume size can only be increased, not decreased * Storage class must specify allowVolumeExpansion: true * Filesystem resize without pod recreation is only possible if the storage driver allows it This is based on the Elasticsearch implementation An update of the storage request in the volumeClaimTemplate will * Update the storage requests spec of all existing PVCs: they are immediately resized by the storage driver, if inline expansion is supported. Otherwise Pods need to be recreated. * Delete the StatefulSet, but not the pod that it owns, storing recreation details in an annotation on the owning Logstash resource * Recreate the StatefulSet with the new volumeClaimTemplate spec * Remove the recreation annotation from the Logstash resource This also reworks the webhook validation to validate that storage updates fulfill the requirements (only increasing storage, using a valid storage class that allows storage expansion). This required moving the webhook validation into the controller package to allow use of the k8sclient without a dependency cycle, and requires some rework in webhook registration --------- Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
Add support for resizing Elasticsearch volumes, by simply editing the storage requests in the
volumeClaimTemplates
section of the spec.Demo
Constraints
allowVolumeExpansion: true
Implementation overview
Resizing PVCs is not supported by the StatefulSet controller at this time. The volume section of the StatefulSet spec is immutable.
However, it is possible to work around the StatefulSet limitations this way:
Follow-up (in other PRs)
Relates #325.