-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: healer controller design proposal #94
Conversation
As discussed, had opened ceph/ceph-csi#2794 for the |
* The side-car works as relay, upon receiving a gRPC call from healer controller, the CSI-Addons side-car will then check volume's mounting conditions collected via the CSI NodeGetVolumeStats RPC. | ||
* In case if any volume condition is reported abnormal, the CSI-Addons side-car will act on it, it does some internal operation like fetching secrets, constructing staging path and finally sends NodeStageVolume request to the CSI driver and finally returns back the response to the controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sent a heal call to csidriver and let it decide whether node stage req is required or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it looks simple but, I think this needs more cycles for fetching secrets, formatting the staging path, and capturing/collecting more details needed to make the NodeStageCall(), for almost all the volume even if they are healthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it looks simple but, I think this needs more cycles for fetching secrets, formatting the staging path, and capturing/collecting more details needed to make the NodeStageCall(), for almost all the volume even if they are healthy.
sidecar is just a relay,
its better to have more api calls in single request flow from controller to sidecar to csidriver
than to have back and forth between sidecar and csi-driver while controller sits waiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is a bit debatable topic on its own.
- For restart of the nodeplugin case, it will mostly need NodeStageVolume() for 100% of PVs
- For periodic checks, I'm optimistic that most of the time the health of the volume is in a good state, only the corner cases are process restarts (considering userspace mounters). Hence I think sending NodeStageVolume() all the time is a bit costly.
So maybe a hybrid model is needed to satisfy and save some jiffies here, like
- For restart case send a NodeStageVolume()
- For periodic health checks scenarios, send a NodeGetVolumeStats() first and then send a NodeStageVolume() if the status of the PV is set to abnormal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI-Addons can not send CSI commands, those are reserved for the CO.
CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.
There is a difference between regular health checks, and an informational message "you restarted, and you should make sure the volume is still available". The last part is expected when a CSIAddonsNode is detected, the 1st is for https://github.com/kubernetes-csi/external-health-monitor/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we can implement a function like HealVolume() that can be called from sidecar which does NodeGetVolumeStats() and Heal() in one go, as @Rakshith-R pointed above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function like HealVolume()
would need implementation in the CSI-driver (called from the sidecar). CSI-drivers needs to handle healing differently, I expect.
NodeGetVolumeStats()
is a CSI call, so can not be called from CSI-Addons operations. You need to design an operation that the CSI-Addons sidecar can call on the CSI-driver. The design for Heal()
and a HealthCheck()
function need to go to csi-addons/spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function like HealVolume() would need implementation in the CSI-driver (called from the sidecar). CSI-drivers needs to handle healing differently, I expect.
Yes
NodeGetVolumeStats() is a CSI call, so can not be called from CSI-Addons operations. You need to design an operation that the CSI-Addons sidecar can call on the CSI-driver. The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.
@nixpanic
oh! sorry if that confused you, I now understand why we do not want to use NodeGetVolumeStats() served by the CSI driver which was for CSI Spec. I mean NodeGetVolumeStats() RPC is also served by CSI driver for CSI-addons sidecar. Do we see any reason why we do not want to use the same RPC signature logically, but by a second server? do you think it just creates some additional confusion? if so we can just name it NodeGetVolumeStatus().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe I will just have the new RPC naming it as NodeGetVolumeStatus() which is mostly a copy of NodeGetVolumeStats()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.
@nixpanic I expect I will have more clarity on Heal() and a HealthCheck() functions while I work on this RFE, so I hope to start on the draft implementation first and then send a PR to csi-addons/spec, only after the spec gets merged I will send the actual controller PR.
Hope that won't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to have a description of the procedures that a CSI-driver should implement for this functionality. That description needs to get reviewed and merged in the csi-addons/spec repository before a Kubernetes implementation can be proposed.
* The side-car works as relay, upon receiving a gRPC call from healer controller, the CSI-Addons side-car will then check volume's mounting conditions collected via the CSI NodeGetVolumeStats RPC. | ||
* In case if any volume condition is reported abnormal, the CSI-Addons side-car will act on it, it does some internal operation like fetching secrets, constructing staging path and finally sends NodeStageVolume request to the CSI driver and finally returns back the response to the controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI-Addons can not send CSI commands, those are reserved for the CO.
CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.
There is a difference between regular health checks, and an informational message "you restarted, and you should make sure the volume is still available". The last part is expected when a CSIAddonsNode is detected, the 1st is for https://github.com/kubernetes-csi/external-health-monitor/
I had already opened ceph/ceph-csi#2794, and will update the implementation details there soon. I will also send the spec changes after some more clarity on the implementation side details(more on the internal structures). Maybe in parallel to the implementation PR. This is just the overall design idea, for everyone's reference.
I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented?
Yes, definitely the CSI-Addons can implement its own RPC, just wanted to avoid another copy-paste. |
Drivers (at least Ceph-CSI) offer different gRPC servers for different protocols, so it is technically prevented. Mixing protocols does not look like a good practice, or very clean. If both the CO and CSI-Addons Controller send CSI operations, we break the spec https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency A CSI-driver will need to take care that healing does not interfere with actions the CO does. The only practical way to do so, is by using different operations for CSI and CSI-Addons that can coordinate (block/abort/..) concurrent access. |
@nixpanic IIUC, this means even sidecars calling NodeGetVolumeStats is bad? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is expected that CSI-Driver will run two different servers one for CSI Spec and one for CSI-Addons spec.
I think, from CSIAddons sidecar perspective, we should only call CSI-Addons spec rpc calls and not mix both of them.
* Watch for the `CSIAddonsNode` object, if the object is created/modified by the side-car do below operations | ||
* Fetch all the VolumeAttachment` list | ||
* Filter the `VolumeAttachment` list through matching driver name, node name and status attached | ||
* For each `VolumeAttachment` get the respective PersistentVolume information and check the criteria of PersistentVolume Bound (also we can have configuration options like mounter type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send request irrespective of any particular PV config option ?
Or
let the driver include more information in its capabilities response, so this decision can be made with the help of the extra info provided by the driver itself ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is similar to the latter one, somehow get the choice like, what kind of mounter types need what services.
Example:
{
rbd-nbd: healing
krbd: monitoring
xyz: disable
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to leave this more for an implementation-specific detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI-Addons capabilities from a CSI-driver should include if it supports healing a volume. If the CSI-driver does not need healing, it can omit the capability.
In case the CSI-driver needs to perform heal checks on a volume (depending on the volume options?), it can either return things like HEALTH_OK
(to recheck later), HEALTH_UNSUPPORTED
(no need to check, this volume does not support healing), HEALTH_BAD
(needs healing) and so on. But this is a discussion for csi-addons/spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI-Addons should not need to know about different mounter types, that would require extending the list of types and the implementation for each CSI-driver. Better have a generic approach that works for all CSI-drivers, so that any CSI-driver can opt-in without changing CSI-Addons itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable.
HEALTH_OK
, HEALTH_UNSUPPORTED
, HEALTH_BAD
are some kind of volume attributes that CSI driver sets on the persistent volume?
In my previous comment, I mean something like:
- capability response returns some string that holds
{
Mounter1: NEED_HEALING
Mounter2: NEED_MONITORING
Mounter3: UNKNOWN
}
- If the PV attribute mounter matches with
Mounter1
, attempt to heal it. - If it is of type
Mounter2
only check the health, - If it is of type
Mounter3
do not attempt anything from the healer controller.
isn't it generic enough? The CSI-addons controller doesn't have to know what is Mounter1, it just reads the capability options and matches with the PV attributes, we don't hardcode the mounter types in the addons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interactions between CSI-Addons and the CSI-driver need to be documented in a csi-addons/spec. The protocol then describes what a CSI-driver needs to implement for supporting the healer operations. This will probably include a description for capabilities of the CSI-driver. Certain operations on a volume can return attributes that healing is not supported for the volume, but other volumes from the same driver might support more features.
I do not think CSI-Addons should know about internals of the CSI-driver, so mounters would not be needed,
@Madhu-1 @nixpanic @Rakshith-R Can we finalize this, please? Let me know if we are missing any major calrifications in the design. If not, I hope we can discuss and address any nitty-gritty details in the implementation PR. Thanks! |
Yes. CSI-Addons should not mix calling CSI operations. A CSI-driver will need to provide dedicated CSI-Addons operations that can be called instead. The CSI-driver is then in charge of making sure no conflicting calls from the CO are being processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more question and I think we do without gRPC description in the doc ?
Everything else look good to me.
* Log the final response returned from the side-car | ||
|
||
## CSI-Addons side-car | ||
* When a nodeplugin pod is restarted the side-car will create [CSIAddonsNode](https://github.com/csi-addons/kubernetes-csi-addons/blob/main/api/v1alpha1/csiaddonsnode_types.go) object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will create or update the object as per
kubernetes-csi-addons/sidecar/internal/csiaddonsnode/csiaddonsnode.go
Lines 135 to 138 in 57ed4f6
if err != nil && !apierrors.IsAlreadyExists(err) { | |
return fmt.Errorf("failed to create csiaddonsnode object: %w", err) | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced with a comment by Niels earlier:
I think you are concerned that the CSI-driver container in the pod restarted, and not the pod was re-created.
Only in that case, the CSIAddonsNode object would exist. It is not clear to me if this is something that can happen in reality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the objects already exist there should be an update operation to let the controller know that sidecar is restarted. am missing this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1
As far as I understand from Niels previous comments, the object will be existing only when the container is restarted and it will be absent when the whole pod is restarted. If that is the case, when do we expect the container to restart, practically? Hence is it worth mentioning it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct i don't have a practical answer for this one either. i was thinking object update might help which is not present currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's take it later? maybe based on future discussions?
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Pull request has been modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to see this in combination with a proposal in csi-addons/spec, with that it is much clearer how an interface between CSI-Addons and the CSI-driver looks like.
* Watch for the `CSIAddonsNode` object, if the object is created/modified by the side-car do below operations | ||
* Fetch all the VolumeAttachment` list | ||
* Filter the `VolumeAttachment` list through matching driver name, node name and status attached | ||
* For each `VolumeAttachment` get the respective PersistentVolume information and check the criteria of PersistentVolume Bound (also we can have configuration options like mounter type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interactions between CSI-Addons and the CSI-driver need to be documented in a csi-addons/spec. The protocol then describes what a CSI-driver needs to implement for supporting the healer operations. This will probably include a description for capabilities of the CSI-driver. Certain operations on a volume can return attributes that healing is not supported for the volume, but other volumes from the same driver might support more features.
I do not think CSI-Addons should know about internals of the CSI-driver, so mounters would not be needed,
Syncing latest changes from main for kubernetes-csi-addons
closing this as stale, feel free to reopen when required. |
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>