-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed resizing #142
Comments
In the resizer, the sidecar probably needs to check whether the volume is local before invoking the CSI driver. This can be achieved via the existing topology field. kubernetes-csi/external-provisioner#592 might help to reduce traffic by configuring the informer so that it filters by label and thus only receives PVs that are of interest. See kubernetes-csi/external-provisioner#583 for code which does such server-side filtering for CSIStorageCapacity objects. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle rotten |
How much effort is required for this? |
I suspect it'll be easier than distributed provisioning because the sidecar can already determine from the PV whether it is responsible for the operation. See kubernetes-csi/external-snapshotter#585 for the corresponding feature in the snapshotter. |
I am not convinced that we need distributed resizing in external-resizer. I see a clear use case of why distributed provisioning is needed for local volumes but I don't see a similar use case for resizing. Compared to provisioning - resizing is a two step process and Is there a use case, where both |
A fair point. My sense is that controllers generally have no knowledge or understanding of how the device is being used and is just expanding raw capacity while node operations (when necessary) are meant to deal with the specifics of ensuring the volume (generally a block device if node resizing is involved) capacity is readily accessible to the workload (generally resizing the fs as necessary). I prefer to keep these logical separations. With that in mind, I have several common code paths that are used by both distributed and centralized deployments (my project currently has 17 distinct 'drivers') so I'd love to be able to keep those logically in the same buckets. From a broader (than k8s) perspective, I think the 'deploy the controller alongside the node' workflow is clearly a goal of csi so, regardless of any perceived value/arguments/points I make above, making assumptions about what makes sense or not probably doesn't serve the interests of the spec in the long-term. The spec is interested in facilitating node+controller deployments jointly and part of controller deployments is resizing...by extension csi compliant COs should be capable of invoking resize operations on controllers deployed in such a fashion. In short, IMO regardless of specific merits for my use-case, it seems a supported csi deployment style that should 'just work' with k8s and/or any other COs. |
The spec does not say anything about node+controller deployment on same nodes. I am still unsure, what problems distributed resizing will solve that is not already solved by calling |
Isn't that figure 2 and 3 here?: https://github.com/container-storage-interface/spec/blob/master/spec.md Again, regardless of that, IMO the provisioner and snapshotter both support node+controller deployments. The fact that k8s has various clients/code-bases to invoke the grpc methods is an implementation details. If k8s supports node+controller it should support it fully and not selectively decide which grpc methods it will or will not invoke based on supposition about the merits/requirements of the feature by csi driver implementations from specific SPs. |
@gnufied: if the CSI driver only runs on the nodes, where would you run external-resizer? If you only run one instance of it per cluster, then it has no CSI driver instance that it can call. Or is the expectation that a CSI driver then has an implementation that runs on the node and another implementation that runs alongside the external-resizer, with the second implementation not doing any actual work? |
Currently - if a CSI driver supports node-only volume expansion, the external-resizer does provide a no-op trivial resizer which is used for updating PV (kubelet can't update PVs) and rest of the volume expansion happens on kubelet via |
I agree that this seems to be sufficient for node-local storage. Is this documented sufficiently so that CSI driver developers can find out about it? |
While I agree it can be done (at least for my use-cases), I disagree with what I consider divergence from the spec. I don’t think it’s appropriate to selectively decide which controller methods will/will not be invoked in node+controller setups based on supposition. Moving on, I’m happy to test the deployment scenario but I’m unsure what it would look like. Is this available with current releases or is some change still required to support the method? Do I deploy resizer with nodes or just 1 instance? Which args should I be using? etc |
I get the point about deploying controller side of the plugin on the node and not having any explicit control-plane associated with CSI driver. I need to think some more about it. I have scheduled this item for discussion on Monday's CSI implementation meeting. If you can manage, please join so as we can talk about it. |
Of course! Send over meeting details here or on slack. |
My understanding is that the controller side will only be skipped if a CSI driver developer explicitly asks for it by deploying the CSI driver in a suitable way. That seems okay to me. How that works specifically I don't know (hence my question about documentation). |
https://kubernetes-csi.github.io/docs/volume-expansion.html documents the deployment workflow. Esp. following bits:
|
OK thanks for pointing out the doc! IMO this furthers my points about the spec. Stating the perhaps obvious, the sidecar (to my knowledge) must be deployed as..a sidecar. It cannot be deployed standalone without the/a controller deployed along with it. That deployment story does not match any of the architectures from the spec. It's a sort of mashing together of 1+2 and/or 1+3 where 'node' hosts are running controller+node plugins and then a 'master' host is running a controller plugin. Or said differently, 'controller' plugin is running both on 'nodes' and 'master' hosts. |
In case when there's an error during expanding, shouldn't this cause update to fail? CSI Spec also requires specific errors to be returned when Controller isn't able to expand the volume, so no-op expansion is not in sync with the spec. |
…ERS-and-OWNERS_ALIASES OCPBUGS-14812: Chore: Update OWNERS and OWNERS_ALIASES
For local storage (e.g. LVM disks) it makes sense to support a deployment model where the driver only runs locally on nodes together with sidecars. This is already support for provisioning (kubernetes-csi/external-provisioner#524, https://github.com/kubernetes-csi/external-provisioner#deployment-on-each-node).
The text was updated successfully, but these errors were encountered: