Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetVolumeCapabilities incorrectly errors when multiple volume capabilities are specified in the PV #153

Open
davidz627 opened this issue Jun 7, 2019 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@davidz627
Copy link
Contributor

davidz627 commented Jun 7, 2019

A PersistentVolume can contain multiple access modes:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

"providers will have different capabilities and each PV’s access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV’s capabilities."

When multiple access modes are given to the CreateVolume call it is the drivers responsibility to provision a volume that satisfies all access modes:

// The capabilities that the provisioned volume MUST have. SP MUST
// provision a volume that will satisfy ALL of the capabilities
// specified in this list. Otherwise SP MUST return the appropriate
// gRPC error code.
// The Plugin MUST assume that the CO MAY use the provisioned volume
// with ANY of the capabilities specified in this list.

Then once the driver provisions a volume the external provisioner then copies the multiple access modes to the PV:

https://github.com/kubernetes-csi/external-provisioner/blob/1730a1e31d92b454e72c8c443369b3338d2f3599/pkg/controller/controller.go#L521

However, when we try to attach the volume, the external attacher rejects multiple access modes on the PV:

return nil, fmt.Errorf("CSI does not support ReadOnlyMany and ReadWriteOnce on the same PersistentVolume")

Since the PV supports multiple AccessModes in Kubernetes, but the CSI call to ControllerPublishVolume accepts only one AccessMode. It is actually the external-attacher's job to "pick" the correct access mode based on other information (or some heuristic) before passing it to the CSI Driver. Throwing an error is not intended behavior.

/cc @saad-ali @msau42 @jsafrane

@jsafrane
Copy link
Contributor

It is actually the external-attacher's job to "pick" the correct access mode based on other information (or some heuristic)

That is the heuristics. See

// Translate array of modes into single VolumeCapability
switch {
case m[v1.ReadWriteMany]:
// ReadWriteMany trumps everything, regardless what other modes are set
cap.AccessMode.Mode = csi.VolumeCapability_AccessMode_MULTI_NODE_MULTI_WRITER
case m[v1.ReadOnlyMany] && m[v1.ReadWriteOnce]:
// This is no way how to translate this to CSI...
return nil, fmt.Errorf("CSI does not support ReadOnlyMany and ReadWriteOnce on the same PersistentVolume")
case m[v1.ReadOnlyMany]:
// There is only ReadOnlyMany set
cap.AccessMode.Mode = csi.VolumeCapability_AccessMode_MULTI_NODE_READER_ONLY
case m[v1.ReadWriteOnce]:
// There is only ReadWriteOnce set
cap.AccessMode.Mode = csi.VolumeCapability_AccessMode_SINGLE_NODE_WRITER
default:
return nil, fmt.Errorf("unsupported AccessMode combination: %+v", pv.Spec.AccessModes)
}
return cap, nil
}

For [ReadOnlyMany, ReadWriteOnce] PV, the external attacher simply does not know if the attachment is going to be consumed as read-only(-many) or as read-write(-once) and that's where its heuristics ends. The nearest usable CSI mode that accommodate both Kubernetes modes is MULTI_NODE_MULTI_WRITER, with all its drawbacks. It won't help you on GCE.

I am afraid that Kubernetes and CSI access modes aren't really compatible. Kubernetes tracks what PV supports and does not really track how it's used in a pod (does a pod with non-RO VolumeMount want ReadWriteOnce or ReadWriteMany?).

In this case, we could add ReadOnly flag from Pod's VolumeMount / PersistentVolumeClaimVolumeSource to VolumeAttachment, but then we need to be extra careful as multiple VolumeMounts may use a volume several times in a pod with different ReadOnly / non-ReadOnly flags.

In addition, what should A/D controller and CSI driver do when a pod with RW volumeMount is scheduled to a node to a node that already has RO attachment? IMO, CSI does not support modification of existing attachments and it does not support attaching the same volume twice.

@davidz627
Copy link
Contributor Author

That is the heuristic

Right, that makes sense.

I guess this is more of a migration compatibility case then. It looks like for in-tree GCE PD if [ReadOnlyMany, ReadWriteOnce] is set the disk is just attached in Read/Write mode. However, currently with CSI and with Migration on, using the same AccessModes will cause the attach to fail because of the external-attacher.

It may be that the "correct" solution for migration is to add some more logic to backwardCompatibleAccesModes for GCE PD specifically:
https://github.com/kubernetes/kubernetes/blob/d873167e8f816f5f83846bc4587c3047452293bd/staging/src/k8s.io/csi-translation-lib/plugins/gce_pd.go#L145

I agree with your assessment, with the current state of our accessmodes and readonly flags there is no way to "fix" this in a satisfying way. This is something we should consider when working on this fix for: kubernetes/kubernetes#70505 as well

@msau42
Copy link
Collaborator

msau42 commented Jun 10, 2019

According to kubernetes/kubernetes#70505, external-attacher also uses readOnly field in CSIPersistentVolumeSource.

Can we not ignore/override accessmode check in this case?

@jsafrane
Copy link
Contributor

According to kubernetes/kubernetes#70505, external-attacher also uses readOnly field in CSIPersistentVolumeSource.

Can we not ignore/override accessmode check in this case?

PV.Spec.CSI.ReadOnly=true and AccessMode=[ReadOnlyMany, ReadWriteOnce] don't go well together - the volume is read-only, so why ReadWriteOnce? The attacher could use MULTI_NODE_READER_ONLY if ReadOnly is true and SINGLE_NODE_WRITER otherwise as a workaround and "best effort" heuristics, but it's not 100% accurate one.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2019
@davidz627
Copy link
Contributor Author

/remove-lifecycle stale
/lifecycle frozen

Important open issue that still has an undefined solution. Freezing so that we don't lose it

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

5 participants