Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable node-expansion to be called on all nodes for RWX volumes #108693

Merged
merged 12 commits into from
Mar 30, 2022

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Mar 14, 2022

This PR implements support for calling NodeExpandVolume on all nodes in case of RWX volumes. It does so by storing pvSpecSize in desiredStateofWorld and pvcStatusSize in actualStateofWorld.

I tested with CSI Mock driver and RWX enabled (branch here - https://github.com/gnufied/csi-test/tree/make-rwx-possible )

and it seems to be working as expected:

Events:
  Type     Reason                      Age                From                                                                           Message
  ----     ------                      ----               ----                                                                           -------
  Normal   ExternalProvisioning        35m (x2 over 35m)  persistentvolume-controller                                                    waiting for a volume to be created, either by external provisioner "io.kubernetes.storage.mock" or manually created by system administrator
  Normal   Provisioning                35m                io.kubernetes.storage.mock_fast.home.lan_7225b6d9-9cd7-4110-a36f-7734e4cf937c  External provisioner is provisioning volume for claim "default/myclaim"
  Normal   ProvisioningSucceeded       35m                io.kubernetes.storage.mock_fast.home.lan_7225b6d9-9cd7-4110-a36f-7734e4cf937c  Successfully provisioned volume pvc-be470238-a485-4ac8-a7f9-2aeae74664b2
  Normal   Resizing                    7m55s              external-resizer io.kubernetes.storage.mock                                    External resizer is resizing volume pvc-be470238-a485-4ac8-a7f9-2aeae74664b2
  Warning  ExternalExpanding           7m55s              volume_expand                                                                  Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
  Normal   FileSystemResizeRequired    7m55s              external-resizer io.kubernetes.storage.mock                                    Require file system resize of volume on node
  Normal   FileSystemResizeSuccessful  7m11s              kubelet                                                                        MountVolume.NodeExpandVolume succeeded for volume "pvc-be470238-a485-4ac8-a7f9-2aeae74664b2"
  Normal   FileSystemResizeSuccessful  6m42s              kubelet                                                                        MountVolume.NodeExpandVolume succeeded for volume "pvc-be470238-a485-4ac8-a7f9-2aeae74664b2"
Call NodeExpand on all nodes in case of RWX volumes

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 14, 2022
@k8s-ci-robot
Copy link
Contributor

@gnufied: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet labels Mar 14, 2022
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 14, 2022
@gnufied gnufied force-pushed the enable-rwx-call-all-nodes branch from 4efb040 to 2b9af72 Compare March 21, 2022 16:56
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2022
@gnufied gnufied force-pushed the enable-rwx-call-all-nodes branch from 2b9af72 to 99adb3b Compare March 21, 2022 18:05
@gnufied gnufied changed the title {WIP} Enable node-expansion to be called on all nodes for RWX volumes Enable node-expansion to be called on all nodes for RWX volumes Mar 22, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2022
@gnufied
Copy link
Member Author

gnufied commented Mar 22, 2022

/retest

@gnufied
Copy link
Member Author

gnufied commented Mar 22, 2022

/assign @jsafrane @jingxu97

@jsafrane
Copy link
Member

/retest
vendor/k8s.io/client-go/util/flowcontrol TestMultithreadedThrottling is flaky !?

@gnufied gnufied force-pushed the enable-rwx-call-all-nodes branch from 195c666 to dee48d3 Compare March 28, 2022 15:59
@gnufied gnufied force-pushed the enable-rwx-call-all-nodes branch from 92dab4e to 6bc8275 Compare March 28, 2022 16:59
@gnufied
Copy link
Member Author

gnufied commented Mar 28, 2022

@jsafrane Updated events to include node names:

  Type     Reason                      Age    From                                                                           Message
  ----     ------                      ----   ----                                                                           -------
  Normal   ExternalProvisioning        2m17s  persistentvolume-controller                                                    waiting for a volume to be created, either by external provisioner "io.kubernetes.storage.mock" or manually created by system administrator
  Normal   Provisioning                2m17s  io.kubernetes.storage.mock_fast.home.lan_65f913eb-bc22-4896-889a-5326a0fd9014  External provisioner is provisioning volume for claim "default/myclaim"
  Normal   ProvisioningSucceeded       2m17s  io.kubernetes.storage.mock_fast.home.lan_65f913eb-bc22-4896-889a-5326a0fd9014  Successfully provisioned volume pvc-3a17631d-3e5a-48bf-9f29-0682b726b844
  Normal   Resizing                    87s    external-resizer io.kubernetes.storage.mock                                    External resizer is resizing volume pvc-3a17631d-3e5a-48bf-9f29-0682b726b844
  Normal   FileSystemResizeRequired    87s    external-resizer io.kubernetes.storage.mock                                    Require file system resize of volume on node
  Warning  ExternalExpanding           87s    volume_expand                                                                  Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
  Normal   FileSystemResizeSuccessful  44s    kubelet                                                                        MountVolume.NodeExpandVolume succeeded for volume "pvc-3a17631d-3e5a-48bf-9f29-0682b726b844" 192.168.121.206
  Normal   FileSystemResizeSuccessful  41s    kubelet                                                                        MountVolume.NodeExpandVolume succeeded for volume "pvc-3a17631d-3e5a-48bf-9f29-0682b726b844" 127.0.0.1

@gnufied
Copy link
Member Author

gnufied commented Mar 28, 2022

I also tested with multiple kubelets and restarts. It is working as expected and whichever kubelet did not observe resize event while it was up, will not call NodeExpand on the volume even when the kubelet comes up. cc @jingxu97

resizeResponse.err = msg
return resizeResponse
}
func (og *operationGenerator) nodeExpandVolume(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where this function is called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets called from doOnlineExpansion function which in turn gets called from ExpandInUsePersistentVolume function.

@@ -712,7 +703,7 @@ func (og *operationGenerator) GenerateMountVolumeFunc(
klog.V(verbosity).InfoS(detailedMsg, "pod", klog.KObj(volumeToMount.Pod))
resizeOptions.DeviceMountPath = volumeMounter.GetPath()

_, resizeError = og.nodeExpandVolume(volumeToMount, actualStateOfWorld, resizeOptions)
_, resizeError = og.expandVolumeDuringMount(volumeToMount, actualStateOfWorld, resizeOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you changing the function name from nodeExpandVolume to expandVolumeDuringMount? The tests should also updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So logic for calling expansion during mount is slightly different from online expansion later on. During mount - we fetch PVC size from api-server and check if volume requires expansion, because during mount this would be the first time volume was observed and hence value in ASOW will be same as value in pvc.Status.

For online expansion while pod is running, however we strictly compare desired and actual sizes. There are tests in reconciler_test.go, which exercise this at higher level.

@jsafrane
Copy link
Member

/lgtm
/hold
for @jingxu97 to have her approval too

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 29, 2022
@jingxu97
Copy link
Contributor

/lgtm
/cancel hold

@gnufied
Copy link
Member Author

gnufied commented Mar 29, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2022
@msau42
Copy link
Member

msau42 commented Mar 29, 2022

/milestone v1.24

@liggitt
Copy link
Member

liggitt commented Apr 4, 2022

#109264 noted tests failing with nil pointer errors that might be related to this PR

@gnufied
Copy link
Member Author

gnufied commented Apr 4, 2022

Thanks I am looking into those. posted some updates - #109264 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

7 participants