Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid watching all secrets in the cluster #1274

Merged
merged 6 commits into from
May 9, 2022

Conversation

guillaumerose
Copy link
Contributor

@guillaumerose guillaumerose commented Nov 26, 2021

Changes

Replace the secret informer by a secret getter.
It implies that for each webhook an API call will be issued against the
k8s API to get the secret.
Previously, all secrets of the cluster were in the interceptor memory.

Before this change:
crictl stats reports 79.18MB with around 5k secrets of 2.5kB with kind.

After this change:
crictl stats reports 8.221MB

Proposition related to #1268

The other way to solve this would be to add a specific label on all secrets watched by the interceptor.
But it would introduce a breaking change between 2 releases.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Release Notes

Reduce memory usage of the core-interceptors container

@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 26, 2021
@tekton-robot tekton-robot requested review from dibyom and khrm November 26, 2021 08:43
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 26, 2021
@guillaumerose
Copy link
Contributor Author

I just saw that the Lister was introduced to avoid throttling :/
What is the good solution? Happy to code it!

@dibyom
Copy link
Member

dibyom commented Nov 29, 2021

What is the good solution? Happy to code it!

Yeah this one will require a bit of thinking -

  1. Could we use getters + a custom cache to cache the get requests vs a informer cache that caches all secrets?
  2. Could we add a label to all secrets that the EL/Interceptors need?

option 2. will simplify the implementation but is a breaking change for users so I'm leaning towards 1.

@guillaumerose
Copy link
Contributor Author

Yes I agree. I think the best is to have a single place where the code calls Secrets(ns).Get(secretName) (#1278), then we can work on a cache (maybe https://pkg.go.dev/k8s.io/apimachinery/pkg/util/cache#LRUExpireCache ?)

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2021
@dibyom
Copy link
Member

dibyom commented Nov 30, 2021

@guillaumerose Nice yeah! A LRU cache should be good enough for our use case

@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2022
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/interceptors/interceptors.go 88.7% 70.7% -18.0

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/interceptors/interceptors.go 88.7% 81.0% -7.8

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

@khrm khrm added this to the Triggers v0.19 milestone Jan 12, 2022
@guillaumerose guillaumerose force-pushed the secretsgetter branch 2 times, most recently from 227a773 to f268c7d Compare January 13, 2022 08:12
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

@guillaumerose
Copy link
Contributor Author

I added the LRU cache. It will store 1024 secrets for 5s maximum. That way, it will ensure if the listener receives a batch of notifications from the same repo, it will call once the apiserver.

I don't cache errors. If the secret is incorrect or absent, it will still issue a call to the apiserver.

WDYT?

@dibyom
Copy link
Member

dibyom commented Jan 20, 2022

Looks good. I think 5s might be reasonable to start with though we might this configurable later.

Did we do some testing with a bunch of concurrent requests? One question I have is how effective this will be for a bunch of requests (e.g. for the same event, multiple triggers) that arrive at the same time. Since each incoming request is processes in its own goroutine, I wonder if this will result in all (or majority) cache misses.

See #594 (comment)

@savitaashture savitaashture removed this from the Triggers v0.19 milestone Feb 16, 2022
@khrm khrm self-assigned this Mar 25, 2022
@khrm khrm added this to the Triggers v0.20 milestone Mar 25, 2022
@savitaashture
Copy link
Contributor

Looks good. I think 5s might be reasonable to start with though we might this configurable later.

Did we do some testing with a bunch of concurrent requests? One question I have is how effective this will be for a bunch of requests (e.g. for the same event, multiple triggers) that arrive at the same time. Since each incoming request is processes in its own goroutine, I wonder if this will result in all (or majority) cache misses.

See #594 (comment)

@dibyom
As discussed have added a print statement here and sent multiple requests
to EventListener which had multiple triggers(I use 3 triggers)

while true; do curl -H 'X-GitHub-Event: pull_request'    -H 'X-Hub-Signature: sha1=ba0cdc263b3492a74b601d240c27efe81c4720cb'    -H 'Content-Type: application/json'    -d '{"action": "opened", "pull_request":{"head":{"sha": "28911bbb5a3e2ea034daf1f6be0a822d50e31e73"}},"repository":{"clone_url": "https://github.com/tektoncd/triggers.git"}}'    http://localhost:8080; done;

I see the cache is working as its configured for 5s i see cache is true for 4-5 times and then its resetting

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2022
@dibyom
Copy link
Member

dibyom commented Apr 29, 2022

/lgtm

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

@khrm khrm force-pushed the secretsgetter branch 2 times, most recently from 459b497 to 2df53e9 Compare May 6, 2022 14:49
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/cel-eval/cmd/root.go 65.6% 67.7% 2.1
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

@khrm
Copy link
Contributor

khrm commented May 6, 2022

/test pull-tekton-triggers-integration-tests

2 similar comments
@khrm
Copy link
Contributor

khrm commented May 7, 2022

/test pull-tekton-triggers-integration-tests

@khrm
Copy link
Contributor

khrm commented May 9, 2022

/test pull-tekton-triggers-integration-tests

Copy link
Contributor

@savitaashture savitaashture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

🎉

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2022
Copy link
Contributor

@savitaashture savitaashture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dibyom, savitaashture

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [dibyom,savitaashture]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@khrm khrm closed this May 9, 2022
@khrm khrm reopened this May 9, 2022
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/cel-eval/cmd/root.go 65.6% 67.7% 2.1
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

guillaumerose and others added 6 commits May 9, 2022 14:17
This cache was unused since the req parameter was always nil.
This will allow us to choose between an informer and a kubernetes
client.

The current implementation is using an informer.
Instead of comparing nil secret with the given secret, it will return
a meaningful error message.
Instead of using an informer, it now uses a k8s client. It avoids
loading all secrets of the cluster in controller memory.

Tests are still using an informer for convenience.
It will reduce the load on the k8s apiserver when receiving a lot of
webhooks coming from the same project.
@khrm khrm force-pushed the secretsgetter branch from 2df53e9 to 06b33d9 Compare May 9, 2022 08:54
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label May 9, 2022
Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2022
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/cel-eval/cmd/root.go 65.6% 67.7% 2.1
pkg/interceptors/interceptors.go 88.7% 88.1% -0.6

@khrm
Copy link
Contributor

khrm commented May 9, 2022

/test pull-tekton-triggers-integration-tests

@tekton-robot tekton-robot merged commit 2950c8b into tektoncd:main May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants