docs: healer controller design proposal #94

pkalever · 2022-01-19T13:51:14Z

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

pkalever · 2022-01-20T06:35:40Z

As discussed, had opened ceph/ceph-csi#2794 for the VOLUME_CONDITION capability.
Please comment below if we can split this into more granular parts.

docs/design/proposals/healer-controller.md

Rakshith-R · 2022-01-20T06:41:52Z

docs/design/proposals/healer-controller.md

+* The side-car works as relay, upon receiving a gRPC call from healer controller, the CSI-Addons side-car will then check volume's mounting conditions collected via the CSI NodeGetVolumeStats RPC.
+* In case if any volume condition is reported abnormal, the CSI-Addons side-car will act on it, it does some internal operation like fetching secrets, constructing staging path and finally sends NodeStageVolume request to the CSI driver and finally returns back the response to the controller.


Just sent a heal call to csidriver and let it decide whether node stage req is required or not.

I know it looks simple but, I think this needs more cycles for fetching secrets, formatting the staging path, and capturing/collecting more details needed to make the NodeStageCall(), for almost all the volume even if they are healthy.

I know it looks simple but, I think this needs more cycles for fetching secrets, formatting the staging path, and capturing/collecting more details needed to make the NodeStageCall(), for almost all the volume even if they are healthy.

sidecar is just a relay,
its better to have more api calls in single request flow from controller to sidecar to csidriver
than to have back and forth between sidecar and csi-driver while controller sits waiting.

I agree that this is a bit debatable topic on its own.

For restart of the nodeplugin case, it will mostly need NodeStageVolume() for 100% of PVs

For periodic checks, I'm optimistic that most of the time the health of the volume is in a good state, only the corner cases are process restarts (considering userspace mounters). Hence I think sending NodeStageVolume() all the time is a bit costly.

So maybe a hybrid model is needed to satisfy and save some jiffies here, like

For restart case send a NodeStageVolume()

For periodic health checks scenarios, send a NodeGetVolumeStats() first and then send a NodeStageVolume() if the status of the PV is set to abnormal?

CSI-Addons can not send CSI commands, those are reserved for the CO.

CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.

There is a difference between regular health checks, and an informational message "you restarted, and you should make sure the volume is still available". The last part is expected when a CSIAddonsNode is detected, the 1st is for https://github.com/kubernetes-csi/external-health-monitor/

I was thinking we can implement a function like HealVolume() that can be called from sidecar which does NodeGetVolumeStats() and Heal() in one go, as @Rakshith-R pointed above

A function like HealVolume() would need implementation in the CSI-driver (called from the sidecar). CSI-drivers needs to handle healing differently, I expect.

NodeGetVolumeStats() is a CSI call, so can not be called from CSI-Addons operations. You need to design an operation that the CSI-Addons sidecar can call on the CSI-driver. The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.

A function like HealVolume() would need implementation in the CSI-driver (called from the sidecar). CSI-drivers needs to handle healing differently, I expect.

Yes

NodeGetVolumeStats() is a CSI call, so can not be called from CSI-Addons operations. You need to design an operation that the CSI-Addons sidecar can call on the CSI-driver. The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.

@nixpanic
oh! sorry if that confused you, I now understand why we do not want to use NodeGetVolumeStats() served by the CSI driver which was for CSI Spec. I mean NodeGetVolumeStats() RPC is also served by CSI driver for CSI-addons sidecar. Do we see any reason why we do not want to use the same RPC signature logically, but by a second server? do you think it just creates some additional confusion? if so we can just name it NodeGetVolumeStatus().

Or maybe I will just have the new RPC naming it as NodeGetVolumeStatus() which is mostly a copy of NodeGetVolumeStats()

The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.

@nixpanic I expect I will have more clarity on Heal() and a HealthCheck() functions while I work on this RFE, so I hope to start on the draft implementation first and then send a PR to csi-addons/spec, only after the spec gets merged I will send the actual controller PR.

Hope that won't hurt.

docs/design/proposals/healer-controller.md

nixpanic

We still need to have a description of the procedures that a CSI-driver should implement for this functionality. That description needs to get reviewed and merged in the csi-addons/spec repository before a Kubernetes implementation can be proposed.

docs/design/proposals/healer-controller.md

nixpanic · 2022-01-21T08:55:14Z

docs/design/proposals/healer-controller.md

+* The side-car works as relay, upon receiving a gRPC call from healer controller, the CSI-Addons side-car will then check volume's mounting conditions collected via the CSI NodeGetVolumeStats RPC.
+* In case if any volume condition is reported abnormal, the CSI-Addons side-car will act on it, it does some internal operation like fetching secrets, constructing staging path and finally sends NodeStageVolume request to the CSI driver and finally returns back the response to the controller.


CSI-Addons can not send CSI commands, those are reserved for the CO.

CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.

There is a difference between regular health checks, and an informational message "you restarted, and you should make sure the volume is still available". The last part is expected when a CSIAddonsNode is detected, the 1st is for https://github.com/kubernetes-csi/external-health-monitor/

docs/design/proposals/healer-controller.md

pkalever · 2022-01-21T09:42:36Z

We still need to have a description of the procedures that a CSI-driver should implement for this functionality. That description needs to get reviewed and merged in the csi-addons/spec repository before a Kubernetes implementation can be proposed.

I had already opened ceph/ceph-csi#2794, and will update the implementation details there soon.

I will also send the spec changes after some more clarity on the implementation side details(more on the internal structures). Maybe in parallel to the implementation PR.

This is just the overall design idea, for everyone's reference.

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented?
I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.

Yes, definitely the CSI-Addons can implement its own RPC, just wanted to avoid another copy-paste.

nixpanic · 2022-01-21T12:52:48Z

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented? I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

Drivers (at least Ceph-CSI) offer different gRPC servers for different protocols, so it is technically prevented. Mixing protocols does not look like a good practice, or very clean.

If both the CO and CSI-Addons Controller send CSI operations, we break the spec https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency

A CSI-driver will need to take care that healing does not interfere with actions the CO does. The only practical way to do so, is by using different operations for CSI and CSI-Addons that can coordinate (block/abort/..) concurrent access.

pkalever · 2022-01-31T11:55:07Z

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented? I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

Drivers (at least Ceph-CSI) offer different gRPC servers for different protocols, so it is technically prevented. Mixing protocols does not look like a good practice, or very clean.

If both the CO and CSI-Addons Controller send CSI operations, we break the spec https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency

A CSI-driver will need to take care that healing does not interfere with actions the CO does. The only practical way to do so, is by using different operations for CSI and CSI-Addons that can coordinate (block/abort/..) concurrent access.

@nixpanic IIUC, this means even sidecars calling NodeGetVolumeStats is bad?

Rakshith-R

It is expected that CSI-Driver will run two different servers one for CSI Spec and one for CSI-Addons spec.

I think, from CSIAddons sidecar perspective, we should only call CSI-Addons spec rpc calls and not mix both of them.

docs/design/proposals/healer-controller.md

Rakshith-R · 2022-02-01T09:31:47Z

docs/design/proposals/healer-controller.md

+* Watch for the `CSIAddonsNode` object, if the object is created/modified by the side-car do below operations
+    * Fetch all the VolumeAttachment` list
+    * Filter the `VolumeAttachment` list through matching driver name, node name and status attached
+    * For each `VolumeAttachment` get the respective PersistentVolume information and check the criteria of PersistentVolume Bound (also we can have configuration options like mounter type)


Send request irrespective of any particular PV config option ?
Or
let the driver include more information in its capabilities response, so this decision can be made with the help of the extra info provided by the driver itself ?

My idea is similar to the latter one, somehow get the choice like, what kind of mounter types need what services.
Example:

{ rbd-nbd: healing krbd: monitoring xyz: disable }

I would like to leave this more for an implementation-specific detail.

CSI-Addons capabilities from a CSI-driver should include if it supports healing a volume. If the CSI-driver does not need healing, it can omit the capability.
In case the CSI-driver needs to perform heal checks on a volume (depending on the volume options?), it can either return things like HEALTH_OK (to recheck later), HEALTH_UNSUPPORTED (no need to check, this volume does not support healing), HEALTH_BAD (needs healing) and so on. But this is a discussion for csi-addons/spec.

CSI-Addons should not need to know about different mounter types, that would require extending the list of types and the implementation for each CSI-driver. Better have a generic approach that works for all CSI-drivers, so that any CSI-driver can opt-in without changing CSI-Addons itself.

This sounds reasonable.

HEALTH_OK, HEALTH_UNSUPPORTED, HEALTH_BAD are some kind of volume attributes that CSI driver sets on the persistent volume?

In my previous comment, I mean something like:

capability response returns some string that holds

{ Mounter1: NEED_HEALING Mounter2: NEED_MONITORING Mounter3: UNKNOWN }

If the PV attribute mounter matches with Mounter1, attempt to heal it.

If it is of type Mounter2 only check the health,

If it is of type Mounter3 do not attempt anything from the healer controller.

isn't it generic enough? The CSI-addons controller doesn't have to know what is Mounter1, it just reads the capability options and matches with the PV attributes, we don't hardcode the mounter types in the addons.

Interactions between CSI-Addons and the CSI-driver need to be documented in a csi-addons/spec. The protocol then describes what a CSI-driver needs to implement for supporting the healer operations. This will probably include a description for capabilities of the CSI-driver. Certain operations on a volume can return attributes that healing is not supported for the volume, but other volumes from the same driver might support more features.

I do not think CSI-Addons should know about internals of the CSI-driver, so mounters would not be needed,

docs/design/proposals/healer-controller.md

pkalever · 2022-02-02T08:50:57Z

@Madhu-1 @nixpanic @Rakshith-R Can we finalize this, please? Let me know if we are missing any major calrifications in the design. If not, I hope we can discuss and address any nitty-gritty details in the implementation PR.

Thanks!

nixpanic · 2022-02-02T09:04:35Z

@nixpanic IIUC, this means even sidecars calling NodeGetVolumeStats is bad?

Yes. CSI-Addons should not mix calling CSI operations. A CSI-driver will need to provide dedicated CSI-Addons operations that can be called instead. The CSI-driver is then in charge of making sure no conflicting calls from the CO are being processed.

Rakshith-R

Just one more question and I think we do without gRPC description in the doc ?
Everything else look good to me.

docs/design/proposals/healer-controller.md

Madhu-1 · 2022-02-02T11:55:30Z

docs/design/proposals/healer-controller.md

+    * Log the final response returned from the side-car
+
+## CSI-Addons side-car
+* When a nodeplugin pod is restarted the side-car will create [CSIAddonsNode](https://github.com/csi-addons/kubernetes-csi-addons/blob/main/api/v1alpha1/csiaddonsnode_types.go) object.


will create or update the object as per

kubernetes-csi-addons/sidecar/internal/csiaddonsnode/csiaddonsnode.go

Lines 135 to 138 in 57ed4f6

if err != nil && !apierrors.IsAlreadyExists(err) {

return fmt.Errorf("failed to create csiaddonsnode object: %w", err)

}

?

I'm convinced with a comment by Niels earlier:

I think you are concerned that the CSI-driver container in the pod restarted, and not the pod was re-created. Only in that case, the CSIAddonsNode object would exist. It is not clear to me if this is something that can happen in reality.

if the objects already exist there should be an update operation to let the controller know that sidecar is restarted. am missing this point.

@Madhu-1
As far as I understand from Niels previous comments, the object will be existing only when the container is restarted and it will be absent when the whole pod is restarted. If that is the case, when do we expect the container to restart, practically? Hence is it worth mentioning it here?

Yes, correct i don't have a practical answer for this one either. i was thinking object update might help which is not present currently.

let's take it later? maybe based on future discussions?

docs/design/proposals/healer-controller.md

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Pull request has been modified.

nixpanic

I would love to see this in combination with a proposal in csi-addons/spec, with that it is much clearer how an interface between CSI-Addons and the CSI-driver looks like.

nixpanic · 2022-02-03T12:27:43Z

docs/design/proposals/healer-controller.md

+* Watch for the `CSIAddonsNode` object, if the object is created/modified by the side-car do below operations
+    * Fetch all the VolumeAttachment` list
+    * Filter the `VolumeAttachment` list through matching driver name, node name and status attached
+    * For each `VolumeAttachment` get the respective PersistentVolume information and check the criteria of PersistentVolume Bound (also we can have configuration options like mounter type)


Interactions between CSI-Addons and the CSI-driver need to be documented in a csi-addons/spec. The protocol then describes what a CSI-driver needs to implement for supporting the healer operations. This will probably include a description for capabilities of the CSI-driver. Certain operations on a volume can return attributes that healing is not supported for the volume, but other volumes from the same driver might support more features.

I do not think CSI-Addons should know about internals of the CSI-driver, so mounters would not be needed,

Syncing latest changes from main for kubernetes-csi-addons

Madhu-1 · 2024-09-03T09:11:33Z

closing this as stale, feel free to reopen when required.

mergify bot requested review from nixpanic, Rakshith-R, yati1998 and Yuggupta27 January 19, 2022 13:51

Rakshith-R reviewed Jan 20, 2022

View reviewed changes

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved

Rakshith-R reviewed Jan 20, 2022

View reviewed changes

nixpanic requested changes Jan 21, 2022

View reviewed changes

pkalever mentioned this pull request Jan 31, 2022

implement VOLUME_CONDITION capability ceph/ceph-csi#2833

Closed

pkalever force-pushed the design-doc branch from 64d2575 to c9f4c79 Compare January 31, 2022 11:38

pkalever requested review from nixpanic and Rakshith-R January 31, 2022 11:56

Rakshith-R requested changes Feb 1, 2022

View reviewed changes

pkalever force-pushed the design-doc branch from c9f4c79 to 45ef13f Compare February 1, 2022 11:21

pkalever requested a review from Rakshith-R February 1, 2022 11:21

Rakshith-R requested changes Feb 2, 2022

View reviewed changes

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved

pkalever force-pushed the design-doc branch from 45ef13f to fdd26e9 Compare February 2, 2022 10:02

pkalever requested a review from Rakshith-R February 2, 2022 10:02

Rakshith-R previously approved these changes Feb 2, 2022

View reviewed changes

Rakshith-R requested a review from Madhu-1 February 2, 2022 11:37

Madhu-1 reviewed Feb 2, 2022

View reviewed changes

docs: healer controller design proposal

06a7c1d

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

pkalever force-pushed the design-doc branch from fdd26e9 to 06a7c1d Compare February 2, 2022 12:14

pkalever requested a review from Madhu-1 February 2, 2022 12:14

Madhu-1 approved these changes Feb 2, 2022

View reviewed changes

pkalever requested a review from Rakshith-R February 2, 2022 13:35

Rakshith-R approved these changes Feb 3, 2022

View reviewed changes

nixpanic requested changes Feb 3, 2022

View reviewed changes

nixpanic mentioned this pull request Feb 3, 2022

Accommodate Volume Healer csi-addons/spec#20

Open

Madhu-1 mentioned this pull request Feb 15, 2022

cephfs: detect corrupted FUSE mounts and try to restore them ceph/ceph-csi#2634

Merged

mergify bot requested review from matancarmeli7 and riya-singhal31 July 13, 2023 11:37

nixpanic pushed a commit to nixpanic/kubernetes-csi-addons that referenced this pull request Jan 12, 2024

Merge pull request csi-addons#94 from red-hat-storage/sync_us--main

8787c1b

Syncing latest changes from main for kubernetes-csi-addons

mergify bot requested review from iPraveenParihar, karthik-us, nixpanic and Rakshith-R March 8, 2024 09:41

Madhu-1 closed this Sep 3, 2024

		* The side-car works as relay, upon receiving a gRPC call from healer controller, the CSI-Addons side-car will then check volume's mounting conditions collected via the CSI NodeGetVolumeStats RPC.
		* In case if any volume condition is reported abnormal, the CSI-Addons side-car will act on it, it does some internal operation like fetching secrets, constructing staging path and finally sends NodeStageVolume request to the CSI driver and finally returns back the response to the controller.

	if err != nil && !apierrors.IsAlreadyExists(err) {
	return fmt.Errorf("failed to create csiaddonsnode object: %w", err)
	}

docs: healer controller design proposal #94

docs: healer controller design proposal #94

Conversation

pkalever commented Jan 19, 2022

pkalever commented Jan 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkalever Feb 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nixpanic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkalever commented Jan 21, 2022

nixpanic commented Jan 21, 2022

pkalever commented Jan 31, 2022

Rakshith-R left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkalever Feb 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkalever commented Feb 2, 2022

nixpanic commented Feb 2, 2022

Rakshith-R left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nixpanic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Madhu-1 commented Sep 3, 2024

pkalever Feb 2, 2022 •

edited

Loading

pkalever Feb 2, 2022 •

edited

Loading