Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Healer operation #40

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

pkalever
Copy link
Contributor

@pkalever pkalever commented Feb 8, 2022

Healer Operation is used both for healing the volume and checking for
health of the volume, it is up to the CSI driver whether to implement the
healer mechanism or just live with health checks.

Fixes: #20
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

NOTE: I will add the proto auto gen files once the review is finalized

@pkalever
Copy link
Contributor Author

pkalever commented Feb 8, 2022

@nixpanic @Madhu-1
Oh! is it mandatory to add the proto file before the review? says the protoc test.

@pkalever pkalever force-pushed the healer branch 3 times, most recently from 39765f5 to e70074f Compare February 8, 2022 11:59
@nixpanic
Copy link
Contributor

nixpanic commented Feb 8, 2022

@nixpanic @Madhu-1 Oh! is it mandatory to add the proto file before the review? says the protoc test.

Yes. The .protoc file should get automatically generated from the protobuf section of your new spec. Generate it with help from the makefile that you created, and add the protoc file in a separate commit (see other new feature PRs for examples).

@pkalever
Copy link
Contributor Author

pkalever commented Feb 9, 2022

@nixpanic @Madhu-1 Oh! is it mandatory to add the proto file before the review? says the protoc test.

Yes. The .protoc file should get automatically generated from the protobuf section of your new spec. Generate it with help from the makefile that you created, and add the protoc file in a separate commit (see other new feature PRs for examples).

Thanks @nixpanic, a few minutes later I had realized that I need to add the proto file too, so pushed the proto file too, PTAL

Copy link
Contributor

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// active (staged/published) volume.
service HealerNode {
// NodeHealer is a procedure that gets called on the CSI NodePlugin.
rpc NodeHealer (NodeHealerRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this HealVolume?

I would expect a way to check if a volume needs healing. The current NodeHealer call seems to do that, but also is expected to do the healing?

It would probably be useful to have a check, and only perform healing if the check returns that it is needed. This then can provide better feedback and tracking of actions that were performed. It might even be useful to fallback to some other recovery/healing mechanism in case volume-healing failed (extension for HealNode which might do a reboot or inform medik8s or similar?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic
NodeHealer() is called on CSI NodePlugin, so it's up to SP to implement/perform healing or not, right? if so why do we need a flag/check?

I'm probably missing something? Is this flag set as part of the request or response?

  • If it is part of request then who will be setting it because we don't have any CRDs here.
  • If it is part of the response, may be based on the response CSI addon might want to do a reboot or similar?

Could you please make it more clear?
Also cannot we detect healing failed from the message string which is part of the response?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider two procedures:

  1. GetHealth() returning the state of the volume, possibly with a selection of recovery options (VolumeHealing or NodeHealing required)
  2. HealVolume() doing the actual healing, if it can heal the volume at all
  3. the CSI-Addons implementation for the CO can then implement a HealNode function (with Kubernetes medik8s might be an option)

This makes it easier to follow the actions, and improve the reporting to users (Kubernetes events maybe?).

CRDs are Kubernetes specific, so such definitions should go in the kubernetes-csi-addons repository instead. There probably should be a way for the CO to configure the checking interval, maybe even per volume. When using Kubernetes, an annotation on the PVC might be suitable for that (depending on the number of options).

@nixpanic
Copy link
Contributor

nixpanic commented Feb 9, 2022

Thanks @nixpanic, a few minutes later I had realized that I need to add the proto file too, so pushed the proto file too, PTAL

Not only the .proto file, also generated files under lib/. Run make all check-changes locally to see if everything has been committed. The output of git status is expected to be empty.

@mergify mergify bot added the design Adds or updates an operation or service label Mar 14, 2022
Prasanna Kumar Kalever added 3 commits March 15, 2022 17:05
Healer Operation is used both for healing the volume and checking for
health of the volume, it is upto the CSI driver whether to implement the
healer mechanism or just live with health checks.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Change to the build scripts

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Fixes: csi-addons#20
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Adds or updates an operation or service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accommodate Volume Healer
2 participants