Avoid watching all secrets in the cluster #1274

guillaumerose · 2021-11-26T08:43:20Z

Changes

Replace the secret informer by a secret getter.
It implies that for each webhook an API call will be issued against the
k8s API to get the secret.
Previously, all secrets of the cluster were in the interceptor memory.

Before this change:
crictl stats reports 79.18MB with around 5k secrets of 2.5kB with kind.

After this change:
crictl stats reports 8.221MB

Proposition related to #1268

The other way to solve this would be to add a specific label on all secrets watched by the interceptor.
But it would introduce a breaking change between 2 releases.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

Includes tests (if functionality changed/added)
Includes docs (if user facing)
Commit messages follow commit message best practices
Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Release Notes

Reduce memory usage of the core-interceptors container

guillaumerose · 2021-11-26T09:19:53Z

I just saw that the Lister was introduced to avoid throttling :/
What is the good solution? Happy to code it!

dibyom · 2021-11-29T16:25:57Z

What is the good solution? Happy to code it!

Yeah this one will require a bit of thinking -

Could we use getters + a custom cache to cache the get requests vs a informer cache that caches all secrets?
Could we add a label to all secrets that the EL/Interceptors need?

option 2. will simplify the implementation but is a breaking change for users so I'm leaning towards 1.

guillaumerose · 2021-11-30T09:59:16Z

Yes I agree. I think the best is to have a single place where the code calls Secrets(ns).Get(secretName) (#1278), then we can work on a cache (maybe https://pkg.go.dev/k8s.io/apimachinery/pkg/util/cache#LRUExpireCache ?)

dibyom · 2021-11-30T16:34:34Z

@guillaumerose Nice yeah! A LRU cache should be good enough for our use case

tekton-robot · 2022-01-12T09:51:16Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/interceptors/interceptors.go	88.7%	70.7%	-18.0

tekton-robot · 2022-01-12T11:45:17Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/interceptors/interceptors.go	88.7%	81.0%	-7.8

tekton-robot · 2022-01-12T13:18:38Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

tekton-robot · 2022-01-13T08:12:15Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

tekton-robot · 2022-01-13T08:14:34Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

guillaumerose · 2022-01-13T08:46:29Z

I added the LRU cache. It will store 1024 secrets for 5s maximum. That way, it will ensure if the listener receives a batch of notifications from the same repo, it will call once the apiserver.

I don't cache errors. If the secret is incorrect or absent, it will still issue a call to the apiserver.

WDYT?

dibyom · 2022-01-20T16:34:41Z

Looks good. I think 5s might be reasonable to start with though we might this configurable later.

Did we do some testing with a bunch of concurrent requests? One question I have is how effective this will be for a bunch of requests (e.g. for the same event, multiple triggers) that arrive at the same time. Since each incoming request is processes in its own goroutine, I wonder if this will result in all (or majority) cache misses.

See #594 (comment)

savitaashture · 2022-04-08T17:44:58Z

Looks good. I think 5s might be reasonable to start with though we might this configurable later.

Did we do some testing with a bunch of concurrent requests? One question I have is how effective this will be for a bunch of requests (e.g. for the same event, multiple triggers) that arrive at the same time. Since each incoming request is processes in its own goroutine, I wonder if this will result in all (or majority) cache misses.

See #594 (comment)

@dibyom
As discussed have added a print statement here and sent multiple requests
to EventListener which had multiple triggers(I use 3 triggers)

while true; do curl -H 'X-GitHub-Event: pull_request'    -H 'X-Hub-Signature: sha1=ba0cdc263b3492a74b601d240c27efe81c4720cb'    -H 'Content-Type: application/json'    -d '{"action": "opened", "pull_request":{"head":{"sha": "28911bbb5a3e2ea034daf1f6be0a822d50e31e73"}},"repository":{"clone_url": "https://github.com/tektoncd/triggers.git"}}'    http://localhost:8080; done;

I see the cache is working as its configured for 5s i see cache is true for 4-5 times and then its resetting

dibyom · 2022-04-29T18:51:31Z

/lgtm

tekton-robot · 2022-05-06T14:30:18Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

tekton-robot · 2022-05-06T14:49:35Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

tekton-robot · 2022-05-06T14:51:54Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
cmd/cel-eval/cmd/root.go	65.6%	67.7%	2.1
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

khrm · 2022-05-06T19:34:38Z

/test pull-tekton-triggers-integration-tests

khrm · 2022-05-07T01:23:54Z

/test pull-tekton-triggers-integration-tests

khrm · 2022-05-09T05:02:45Z

/test pull-tekton-triggers-integration-tests

savitaashture

/lgtm

🎉

savitaashture

/lgtm

tekton-robot · 2022-05-09T05:55:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dibyom, savitaashture

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dibyom,savitaashture]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

khrm

/lgtm

tekton-robot · 2022-05-09T08:23:03Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
cmd/cel-eval/cmd/root.go	65.6%	67.7%	2.1
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

This cache was unused since the req parameter was always nil.

This will allow us to choose between an informer and a kubernetes client. The current implementation is using an informer.

Instead of comparing nil secret with the given secret, it will return a meaningful error message.

Instead of using an informer, it now uses a k8s client. It avoids loading all secrets of the cluster in controller memory. Tests are still using an informer for convenience.

It will reduce the load on the k8s apiserver when receiving a lot of webhooks coming from the same project.

khrm

/lgtm

tekton-robot · 2022-05-09T08:57:51Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
cmd/cel-eval/cmd/root.go	65.6%	67.7%	2.1
pkg/interceptors/interceptors.go	88.7%	88.1%	-0.6

khrm · 2022-05-09T09:07:26Z

/test pull-tekton-triggers-integration-tests

tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 26, 2021

tekton-robot requested review from dibyom and khrm November 26, 2021 08:43

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 26, 2021

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2021

guillaumerose force-pushed the secretsgetter branch from d0b58b3 to 2661d93 Compare January 12, 2022 09:49

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2022

guillaumerose force-pushed the secretsgetter branch from 6f9185d to 5bb6e63 Compare January 12, 2022 13:17

khrm added this to the Triggers v0.19 milestone Jan 12, 2022

guillaumerose force-pushed the secretsgetter branch 2 times, most recently from 227a773 to f268c7d Compare January 13, 2022 08:12

savitaashture removed this from the Triggers v0.19 milestone Feb 16, 2022

khrm self-assigned this Mar 25, 2022

khrm added this to the Triggers v0.20 milestone Mar 25, 2022

dibyom approved these changes Apr 29, 2022

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2022

tekton-robot assigned dibyom Apr 29, 2022

khrm force-pushed the secretsgetter branch 2 times, most recently from 459b497 to 2df53e9 Compare May 6, 2022 14:49

savitaashture reviewed May 9, 2022

View reviewed changes

tekton-robot assigned savitaashture May 9, 2022

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2022

savitaashture approved these changes May 9, 2022

View reviewed changes

khrm reviewed May 9, 2022

View reviewed changes

khrm closed this May 9, 2022

khrm reopened this May 9, 2022

guillaumerose and others added 6 commits May 9, 2022 14:17

Remove unused cache mechanism when getting secret

d94fb8e

This cache was unused since the req parameter was always nil.

Use our own interface for getting secrets

dd6bd04

This will allow us to choose between an informer and a kubernetes client. The current implementation is using an informer.

Return an error if the key isn't in the secret

5dac3aa

Instead of comparing nil secret with the given secret, it will return a meaningful error message.

Use kubernetes client for getting secrets

ad7e21d

Instead of using an informer, it now uses a k8s client. It avoids loading all secrets of the cluster in controller memory. Tests are still using an informer for convenience.

Add LRU cache for secrets getter in interceptor

16dffe6

It will reduce the load on the k8s apiserver when receiving a lot of webhooks coming from the same project.

Fix cel-eval for interceptor's secretgetter

06b33d9

khrm force-pushed the secretsgetter branch from 2df53e9 to 06b33d9 Compare May 9, 2022 08:54

tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label May 9, 2022

khrm reviewed May 9, 2022

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2022

tekton-robot merged commit 2950c8b into tektoncd:main May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid watching all secrets in the cluster #1274

Avoid watching all secrets in the cluster #1274

guillaumerose commented Nov 26, 2021 •

edited by khrm

Loading

guillaumerose commented Nov 26, 2021

dibyom commented Nov 29, 2021 •

edited

Loading

guillaumerose commented Nov 30, 2021

dibyom commented Nov 30, 2021

tekton-robot commented Jan 12, 2022

tekton-robot commented Jan 12, 2022

tekton-robot commented Jan 12, 2022

tekton-robot commented Jan 13, 2022

tekton-robot commented Jan 13, 2022

guillaumerose commented Jan 13, 2022

dibyom commented Jan 20, 2022 •

edited

Loading

savitaashture commented Apr 8, 2022

dibyom commented Apr 29, 2022

tekton-robot commented May 6, 2022

tekton-robot commented May 6, 2022

tekton-robot commented May 6, 2022

khrm commented May 6, 2022

khrm commented May 7, 2022

khrm commented May 9, 2022

savitaashture left a comment

savitaashture left a comment

tekton-robot commented May 9, 2022

khrm left a comment

tekton-robot commented May 9, 2022

khrm left a comment

tekton-robot commented May 9, 2022

khrm commented May 9, 2022

Avoid watching all secrets in the cluster #1274

Avoid watching all secrets in the cluster #1274

Conversation

guillaumerose commented Nov 26, 2021 • edited by khrm Loading

Changes

Submitter Checklist

Release Notes

guillaumerose commented Nov 26, 2021

dibyom commented Nov 29, 2021 • edited Loading

guillaumerose commented Nov 30, 2021

dibyom commented Nov 30, 2021

tekton-robot commented Jan 12, 2022

tekton-robot commented Jan 12, 2022

tekton-robot commented Jan 12, 2022

tekton-robot commented Jan 13, 2022

tekton-robot commented Jan 13, 2022

guillaumerose commented Jan 13, 2022

dibyom commented Jan 20, 2022 • edited Loading

savitaashture commented Apr 8, 2022

dibyom commented Apr 29, 2022

tekton-robot commented May 6, 2022

tekton-robot commented May 6, 2022

tekton-robot commented May 6, 2022

khrm commented May 6, 2022

khrm commented May 7, 2022

khrm commented May 9, 2022

savitaashture left a comment

Choose a reason for hiding this comment

savitaashture left a comment

Choose a reason for hiding this comment

tekton-robot commented May 9, 2022

khrm left a comment

Choose a reason for hiding this comment

tekton-robot commented May 9, 2022

khrm left a comment

Choose a reason for hiding this comment

tekton-robot commented May 9, 2022

khrm commented May 9, 2022

guillaumerose commented Nov 26, 2021 •

edited by khrm

Loading

dibyom commented Nov 29, 2021 •

edited

Loading

dibyom commented Jan 20, 2022 •

edited

Loading