Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove disk feature #53

Merged
merged 7 commits into from
May 4, 2023
Merged

Remove disk feature #53

merged 7 commits into from
May 4, 2023

Conversation

alex-necula
Copy link

@alex-necula alex-necula commented Mar 27, 2023

Q A
Bug fix? no
New feature? yes
API breaks? no
Deprecations? no
Related tickets internal
License Apache 2.0

What's in this PR?

This PR adds the disk removal functionality. It is based on Cruise Control's new Disk Removal endpoint: adobe/cruise-control#12

The following logic is followed, in a successful scenario:

  • koperator compares the list of existing mount paths with the desired mount paths from the spec
  • if a removal is detected, it will mark the disk for removal
  • a CC task is created that will drain the data from the removed disks
  • KafkaCluster will have "ClusterReconciling" state until the disk is drained successfully, no rolling upgrades will be done until it completes
  • if the operation completes, the Rolling Upgrade is performed, PVC is removed

The following logic is followed, in a failure scenario:

  • cluster is stuck in ClusterReconciling state because the disk removal failed
  • the user should put back all the disks he removed (completely rollback the disk removal, otherwise the cluster will still be stuck in ClusterReconciling, since CC returns error for the whole disk removal operation)
  • the readded disks will be marked for rebalance and a disk rebalance will be performed
  • no rolling upgrades will be done

Why?

Previously, disk removal was blocked because there was a risk of data loss. Therefore, disk downscale was not possible, incurring additional costs for unused disks, that were upscaled in the past.

Additional context

This PR was tested locally using kind.

During the time between the disk removal starts and the moment the rolling upgrade is performed, Kafka may assign new topics to the removed disks. This presents a potential data loss risk, but I am not aware of methods that would mitigate this risk. During the downscale, users should be informed not to create new topics.

cc @ilievladiulian

Checklist

  • Implementation tested
  • Error handling code meets the guideline
  • Logging code meets the guideline
  • User guide and development docs updated (if needed)

To Do

@alex-necula alex-necula added the enhancement New feature or request label Mar 27, 2023
@alex-necula alex-necula self-assigned this Mar 27, 2023
pkg/scale/scale.go Outdated Show resolved Hide resolved
pkg/scale/scale.go Outdated Show resolved Hide resolved
@alex-necula alex-necula marked this pull request as ready for review March 27, 2023 17:08
@alex-necula alex-necula requested a review from a team March 27, 2023 17:19
@alex-necula alex-necula force-pushed the remove-disk-feature branch from e4e415b to 760aeaf Compare March 27, 2023 17:54
@alex-necula
Copy link
Author

PR waiting for successful upstream rebase

@alex-necula alex-necula marked this pull request as draft March 28, 2023 09:29
@alex-necula alex-necula changed the base branch from master to master_rebase_backup March 28, 2023 09:31
@amuraru
Copy link

amuraru commented Mar 28, 2023

PR waiting for successful upstream rebase

what do you mean?

@alex-necula
Copy link
Author

alex-necula commented Mar 28, 2023

what do you mean?

@aguzovatii performed a rebase on master branch with the latest upstream changes but it was not successful. Waiting for him to fix the conflicts. Until then, this PR has the backup branch as base

See: #54

@alex-necula alex-necula force-pushed the remove-disk-feature branch from 760aeaf to 4b05922 Compare March 28, 2023 12:27
@alex-necula alex-necula changed the base branch from master_rebase_backup to master March 28, 2023 12:27
@alex-necula alex-necula marked this pull request as ready for review March 28, 2023 12:27
@alex-necula
Copy link
Author

Performed a rebase using the latest upstream changes.

Tests will pass once #54 is merged

@alex-necula
Copy link
Author

alex-necula commented Mar 28, 2023

Update the PR with the almost final form, once the go-cruise-control lib will have an adobe fork. Assuming all the tests will pass, the only thing that has to be changed is the go.mod version for the lib.

Until the fork is done, I made the following PR in my personal repo (the code is based on these changes): alex-necula/go-cruise-control#1

This PR has the following blockers:

@alex-necula alex-necula force-pushed the remove-disk-feature branch from a8152ca to 971cd92 Compare April 19, 2023 15:00
@alex-necula
Copy link
Author

Rebased the branch and added local replacement instead of renaming the go-cruise-control lib

@alex-necula
Copy link
Author

PR is fully tested with adobe/cruise-control#12 and ready for review.
The only thing that has to be changed is the local replacement once the adobe fork is created

go.mod Outdated Show resolved Hide resolved
@alex-necula alex-necula force-pushed the remove-disk-feature branch from 049aee3 to 53dcc48 Compare May 4, 2023 11:12
@alex-necula alex-necula merged commit 083004e into master May 4, 2023
amuraru added a commit that referenced this pull request May 16, 2023
* Implement disk removal feature

---------

Co-authored-by: Alex Necula <anecula@adobe.com>
ctrlaltluc pushed a commit that referenced this pull request Jun 8, 2023
* Implement disk removal feature

---------

Co-authored-by: Alex Necula <anecula@adobe.com>
ctrlaltluc pushed a commit that referenced this pull request Jun 8, 2023
* Implement disk removal feature

---------

Co-authored-by: Alex Necula <anecula@adobe.com>
amuraru added a commit that referenced this pull request Jun 10, 2023
* Implement disk removal feature

---------

Co-authored-by: Alex Necula <anecula@adobe.com>
amuraru added a commit that referenced this pull request Dec 12, 2023
* Implement disk removal feature

---------

Co-authored-by: Alex Necula <anecula@adobe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants