Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[forwardport v0.10][SURE-8550] drift detection is generating secrets without cleaning #2518

Closed
rancherbot opened this issue Jun 14, 2024 · 2 comments
Assignees
Milestone

Comments

@rancherbot
Copy link
Collaborator

This is a forwardport issue for #2515, automatically created via GitHub Actions workflow initiated by @aruiz14

Original issue body:

SURE-8550

Issue description:

When enabling Self Healing (drift detection) Fleet will generate a new secret every time drift is detected. To a point where it might exhaust Rancher.
Fleet 0.9.4

Business impact:

For the customer Rancher went down due to too many secrets being cached

Troubleshooting steps:

Disabling self healing will clean the secrets

Repro steps:

  • create a git repo
  • enable self healing
  • scale the deployment up (or anything that triggers a drift correction
  • a new secret will be created at the namespace for the deployment
    • This can also be checked using helm history commands in the target namespace and specifying the Helm release name.

Workaround:

Is a workaround available and implemented? yes
What is the workaround: disable self healing (disabling self healing also remove all the secrets)

Actual behavior:

self healing is not cleaning up the secrets

Expected behavior:

self-healing not to create so many secrets

Files, logs, traces:

Additional notes:

helm history  test-fastweb-hello-world -n hello
REVISION	UPDATED                 	STATUS    	CHART                   	APP VERSION	DESCRIPTION
164     	Wed Jun 12 15:06:08 2024	superseded	nginx-rancherhello-0.0.1	0.0.0      	Upgrade complete
165     	Wed Jun 12 15:06:09 2024	superseded	nginx-rancherhello-0.0.1	0.0.0      	Upgrade complete
166     	Wed Jun 12 15:06:24 2024	superseded	nginx-rancherhello-0.0.1	0.0.0      	Rollback to 165
167     	Wed Jun 12 15:06:31 2024	deployed  	nginx-rancherhello-0.0.1	0.0.0      	Rollback to 166
@rancherbot rancherbot added this to the v2.9.0 milestone Jun 14, 2024
@rancherbot rancherbot added this to Fleet Jun 14, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Jun 14, 2024
@aruiz14 aruiz14 changed the title [forwardport v2.9] [0.9] [SURE-8550] drift detection is generating secrets without cleaning [forwardport v0.10][SURE-8550] drift detection is generating secrets without cleaning Jun 14, 2024
@aruiz14 aruiz14 moved this from 🆕 New to 👀 In review in Fleet Jun 14, 2024
@weyfonk
Copy link
Contributor

weyfonk commented Jun 17, 2024

Additional QA

Problem

Correcting drift on Fleet-deployed resources would create a new Helm release, and a new sh.helm.<ID> secret every time, leading to an expanding set of stored secrets and Helm history items. This could lead to performance issues.

Solution

Helm Rollback operations, used internally by Fleet to correct drift, now obey Fleet's global limit on Helm history, restricting the number of kept history items to 2.

Testing

(See repro steps above)

  1. Create a GitRepo with drift correction enabled, either via the above example, or as follows:
kind: GitRepo
apiVersion: fleet.cattle.io/v1alpha1
metadata:
  name: test-drift-secrets
spec:
  repo: https://github.com/rancher/fleet-test-data
  paths:
  - simple-chart
  correctDrift:
    enabled: true
    force: true
  1. Edit the deployment. In this simple-chart example, this could consist in editing the ConfigMap created from this GitRepo.

  2. Check that even after Fleet restores the deployment to its specified state (undoing manual changes), Helm history for the corresponding release still contains only 2 elements.

@weyfonk weyfonk moved this from 👀 In review to Needs QA review in Fleet Jun 17, 2024
@sbulage sbulage self-assigned this Jul 1, 2024
@sbulage
Copy link
Contributor

sbulage commented Jul 4, 2024

System Information Before Upgrade After Upgrade
Rancher Version 2.8.5 2.9.0-alpha7
Fleet Version 0.9.5 0.10.0-rc.18

Steps performed:

  1. Created GitRepo by enabling correctDrift
  2. Wait for Nginx application to be install.
  3. Updated deployment from 1-2.
  4. Saw that correctDrift was restoring it back to 1.
  5. Repeated steps 3 atleast 5 times.
  6. Every time it restored the replica count to 1 as expected.
  7. Saw increase in no. of secrets every time made changes to deployment.
  8. Later upgraded Rancher from 2.8.5 to 2.9.0-alpha7.
  9. Wait for the upgrade finish
  10. Again changed replica count from 1-2.
  11. Verified that secrets count was lowered.
  12. Also, checked helm history command which shows only 2 entries.

Outputs:

Secrets Before Upgrade
satya@opensuse15:~> kubectl get secrets -n nginx  -w
NAME                                     TYPE                 DATA   AGE
sh.helm.release.v1.test-drift-nginx.v1   helm.sh/release.v1   1      6m58s
sh.helm.release.v1.test-drift-nginx.v2   helm.sh/release.v1   1      6m58s
sh.helm.release.v1.test-drift-nginx.v3   helm.sh/release.v1   1      3m43s
sh.helm.release.v1.test-drift-nginx.v4   helm.sh/release.v1   1      3m43s
sh.helm.release.v1.test-drift-nginx.v5   helm.sh/release.v1   1      3m27s
sh.helm.release.v1.test-drift-nginx.v6   helm.sh/release.v1   1      3m30s
sh.helm.release.v1.test-drift-nginx.v7   helm.sh/release.v1   1      11s
sh.helm.release.v1.test-drift-nginx.v8   helm.sh/release.v1   1      15s
sh.helm.release.v1.test-drift-nginx.v9   helm.sh/release.v1   1      0s
sh.helm.release.v1.test-drift-nginx.v10   helm.sh/release.v1   1      0s
Secrets After Upgrade
satya@opensuse15:~> kubectl get secrets -n nginx 
NAME                                      TYPE                 DATA   AGE
sh.helm.release.v1.test-drift-nginx.v9    helm.sh/release.v1   1      3m20s
sh.helm.release.v1.test-drift-nginx.v10   helm.sh/release.v1   1      3m10s
Helm history after upgrade
satya@opensuse15:~> helm history -n nginx test-drift-nginx 
REVISION	UPDATED                 	STATUS    	CHART                                   	APP VERSION	DESCRIPTION  
9       	Thu Jul  4 13:56:30 2024	superseded	test-drift-nginx-v0.0.0+git-b2abfd0bfdc3	           	Rollback to 8
10      	Thu Jul  4 13:56:40 2024	deployed  	test-drift-nginx-v0.0.0+git-b2abfd0bfdc3	           	Rollback to 9

@sbulage sbulage moved this from Needs QA review to ✅ Done in Fleet Jul 4, 2024
@sbulage sbulage closed this as completed Jul 4, 2024
@kkaempf kkaempf added the JIRA Must shout label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

5 participants