Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0033 Tekton Feature Gates #280

Merged
merged 1 commit into from
Feb 8, 2021
Merged

Conversation

afrittoli
Copy link
Member

Co-authored-by: Vincent Demeester vdemeest@redhat.com
Signed-off-by: Andrea Frittoli andrea.frittoli@gmail.com

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Nov 23, 2020
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 23, 2020
@vdemeester
Copy link
Member

/cc @tektoncd/core-maintainers @tektoncd/triggers-maintainers @tektoncd/operator-maintainers

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that we'll probably end up using feature flags/gates to solve this problem, but this problem statement seems to assume this is the solution we're going with - is there some way that the goals and non-goals could focus more on the problems we're solving without assuming we'll be going ahead with feature flags? (one requirement/goal that id be interested to see fleshed out is what the experience should be like if we add an "alpha" field and decide to remove it in the next release - what should this be like for the user?)

once we add the proposal I think it would be worth having an alternatives section that discusses other approaches too, so we can be sure this is the one we want

teps/0033-tekton-feature-gates.md Outdated Show resolved Hide resolved
@vdemeester
Copy link
Member

I also think that we'll probably end up using feature flags/gates to solve this problem, but this problem statement seems to assume this is the solution we're going with - is there some way that the goals and non-goals could focus more on the problems we're solving without assuming we'll be going ahead with feature flags? (one requirement/goal that id be interested to see fleshed out is what the experience should be like if we add an "alpha" field and decide to remove it in the next release - what should this be like for the user?)

I think we use the term "feature flag" in this document in a more general ways than an flag (as in a cli or configmap field sense). We could use feature "flip" term or maybe Feature toggle

@bobcatfish
Copy link
Contributor

I think we use the term "feature flag" in this document in a more general ways than an flag (as in a cli or configmap field sense). We could use feature "flip" term or maybe Feature toggle

I'm thinking something even broader that would let us explore completely different options, e.g.:

  1. Maintaining a different set of alpha CRDs (e.g. something like v1alpha2 or something for Pipelines - new features would get added there before adding to v1beta1
  2. Handling this with documentation only (e.g.: when a new feature is added, check the docs to see the stability level)
  3. Handling this with warnings (e.g. warn the user in logs when they use alpha features
  4. Name fields such that it is obvious they are alpha (e.g. alpha_when vs when) and rename when the fields graduate
  5. Add an alpha section to the spec definition
  6. Build multiple releases: one which only has beta fields in beta types, etc., and one that contains alpha fields as well
  7. <more?>

My understanding of the problem we are trying to solve is that its ultimately about communicating to users the stability level of new fields we add when their stability level doesn't match the level of the CRD we add them to.

Probably feature flags/gates/toggles will be the option we settle on but I think it's worth getting clarity on why we are doing this and what all our options are.

@vdemeester
Copy link
Member

My understanding of the problem we are trying to solve is that its ultimately about communicating to users the stability level of new fields we add when their stability level doesn't match the level of the CRD we add them to.

Yes

I'm thinking something even broader that would let us explore completely different options, e.g.:

  1. Maintaining a different set of alpha CRDs (e.g. something like v1alpha2 or something for Pipelines - new features would get added there before adding to v1beta1

This brings some complexity (more version to maintain) and might be confusing for users : the general idea in k8s ecosystem is that v1alpha1 -> v1beta1 -> v1 -> v2alpha1 and v1alpha2 -> v1beta1, so user will, most likely assume that v1beta1 is more recent or "should be used" instead of any v1alphaX. Also, usually, you go from an api version to another one (alpha to beta, v1 to v2 prefix, …) when the shape of the API changes (on existing fields) and/or when you want to graduate your API.

  1. Handling this with documentation only (e.g.: when a new feature is added, check the docs to see the stability level)

Depending on the stability of field, we need to require more or less "explicit" behavior from the user. Handling it solely with documentation can be a problem because, from the user point of view, he didn't have to do anything to get access to the feature, so he is in his rights to think it is supported (even if it's not).

  1. Handling this with warnings (e.g. warn the user in logs when they use alpha features

This is tricky to achieve in our current model (CRD + controllers) as, there is no way to add warnings as far as I know. You are either valid or you are not (when creating/updating the CR). Warning in the controller logs is similar to not do anything as user may not have access to it (RBAC rules, …) and even if they have, it's not explicit at all (you have to know what you are looking for to find the warning in the middle of the logs).

  1. Name fields such that it is obvious they are alpha (e.g. alpha_when vs when) and rename when the fields graduate

This makes the graduation process a little bit harder because we require a rename of a field (even if it had the good name from the start). This mean we will need to support both alpha_when and when at the same time for a certain amount of time. This will happen in any case, but this would generalize the behavior for absolutely all new "alpha" fields.

  1. Add an alpha section to the spec definition

What about field that would be added "inside" others ? For example, adding a field to the step list ? Should it be alpha.steps[0].myfield or steps[0].alpha.myfield ? I am not a fan of both approach 😅

  1. Build multiple releases: one which only has beta fields in beta types, etc., and one that contains alpha fields as well

This is very similar to using feature toggle but has no advantage over it. It is also a all-or-nothing approach (aka either you have all alpha feature, or none).

Also, this docs is based on a precedent, which is how kubernetes handles the same problem 👼.

@bobcatfish
Copy link
Contributor

I think those are all great points to highlight about each approach - what i'd really like is for the TEP to include all the alternatives (and explain why we don't want to pursue them). Which would mean stating the problem in such a way that it doesn't assume we'll solve it with feature flags.

Also, this docs is based on a precedent, which is how kubernetes handles the same problem 👼.

I think that's a good point in favor of the approach but also I want to make sure that we take a look at what our specific needs are and make sure this approach makes the most sense for us as well.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2020
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 4, 2020

CLA Signed

The committers are authorized under a signed CLA.

@afrittoli afrittoli force-pushed the feature_flags branch 2 times, most recently from 4d69ae8 to e6a4515 Compare December 4, 2020 10:27
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2020
@afrittoli
Copy link
Member Author

Thanks for the feedback @bobcatfish - we've taken a step (or two) back, this version describes the issue more generally and it doesn't make assumptions on the solution.

@afrittoli afrittoli changed the title TEP-0033 Tekton Feature Flags TEP-0033 Tekton Feature Gates Dec 4, 2020
@ghost
Copy link

ghost commented Dec 4, 2020

Updates look good!

/lgtm

@tekton-robot tekton-robot assigned ghost Dec 4, 2020
@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 4, 2020
changes to the API, since any change made will have to be supported for at
least 9 months.

The tools we have today are deprecation notices in the release notes, a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another tool is using annotations to control a feature on a per resource (Task/Template etc.) basis.

e.g. in triggers we have a custom annotaiton for templates: https://github.com/tektoncd/triggers/blob/master/docs/triggertemplates.md#escaping-quoted-strings

and we proposed a custom annotation for tektoncd/pipeline#3257

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think the annotation/label can be used "in addition to" a way to enable/disable globally.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 18, 2020
Co-authored-by: Vincent Demeester <vdemeest@redhat.com>
Signed-off-by: Andrea Frittoli <andrea.frittoli@gmail.com>
@tekton-robot tekton-robot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 7, 2021
Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I am in favor! A couple comments about requirements

introduced the level of maturity of the associated feature
- Tools to test enabled/disabled feature flags as part of the test matrix
- It should be possible for a Tekton operator to restrict access to
feature flags (*not* on a one-by-one case)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the rest of this doc does a great job of not trying to specify a solution - but these last 3 requirements seem to assume we're going with feature flags. maybe we can reword them as something like:

  1. The maturity of each feature should be clearly documented and it should be clear how to opt into optional features
  2. Tools should be available to test all desired combinations of features at various maturity levels
  3. It should be possible for a Tekton operator to restrict access to alpha features

I'm not 100% sure about (3): if we go with feature flags this is possible, but i dont totally understand if we actually need it - do we want to be able to say that Tekton operators can restrict access to all alpha features, e.g. even alpha CRDS? (if yes then (3) makes sense to me)

a cluster admin role, while still preventing cluster owners from enabling
alpha features. This is not a requirement for Tekton, because even in case of
a managed Tekton, users can be prevented access to the `tekton-pipelines`
namespaces, where the feature flag config map is hosted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could include this in the proposal instead of the requirements?

@bobcatfish
Copy link
Contributor

I'm happy to merge this after just a couple tiny tweaks mentioned above.

I feel like we're already starting to use the functionality this TEP is about before we've resolved it, so I'd like to jump ahead and make a request regarding how we implement this:

Can we use one feature flag for all alpha features, vs. adding a flag per feature?

We seem to be adding a flag per feature, e.g. we already have separate flags for enabling custom tasks and enabling oci bundles.

I want to suggest one flag for all alpha features because:

  1. The more feature flags we add, the more bugs we aren't going to be able to catch - we'll be unable to test all the possible combinations of these flags <-- this is the main reason
  2. This makes it easy for folks to opt into alpha features without having to think about each one individually (i.e. more of a chance that folks will try these alpha features and give feedback - this assumes we have some kind of "cutting edge" user who wants to try all the shiny new things)
  3. I think this is the best representation of what we are trying to accomplish given the problem statement (i.e. we can't mark individual features in CRDs as alpha)

@afrittoli
Copy link
Member Author

I'm happy to merge this after just a couple tiny tweaks mentioned above.

I feel like we're already starting to use the functionality this TEP is about before we've resolved it, so I'd like to jump ahead and make a request regarding how we implement this:

Can we use one feature flag for all alpha features, vs. adding a flag per feature?

We seem to be adding a flag per feature, e.g. we already have separate flags for enabling custom tasks and enabling oci bundles.

I want to suggest one flag for all alpha features because:

  1. The more feature flags we add, the more bugs we aren't going to be able to catch - we'll be unable to test all the possible combinations of these flags <-- this is the main reason
  2. This makes it easy for folks to opt into alpha features without having to think about each one individually (i.e. more of a chance that folks will try these alpha features and give feedback - this assumes we have some kind of "cutting edge" user who wants to try all the shiny new things)
  3. I think this is the best representation of what we are trying to accomplish given the problem statement (i.e. we can't mark individual features in CRDs as alpha)

Thanks @bobcatfish, I think this is a good suggestion.

We might want to consider separate cases for API driven features vs other ones. I think we could have one single flag to control all alpha fields in the API, and dedicated flags for other features. One example from the TEP about finally timeout - as I mentioned in the comment there, if we had a flag to control the API field to specify the finally timeout, enabling that flag would imply that previously specified timeouts would not apply to entire pipelines anymore, but only to the non-finally section. If we had a single flag for all API fields, enabling that flag would then have side effects on existing pipelines, which should not be the case. With separate flags, it would be possible to enabled alpha API fields, while keeping the old behaviour for the overall timeout still valid.

@vdemeester
Copy link
Member

We might want to consider separate cases for API driven features vs other ones. I think we could have one single flag to control all alpha fields in the API, and dedicated flags for other features. One example from the TEP about finally timeout - as I mentioned in the comment there, if we had a flag to control the API field to specify the finally timeout, enabling that flag would imply that previously specified timeouts would not apply to entire pipelines anymore, but only to the non-finally section. If we had a single flag for all API fields, enabling that flag would then have side effects on existing pipelines, which should not be the case. With separate flags, it would be possible to enabled alpha API fields, while keeping the old behaviour for the overall timeout still valid.

+:100: on this.

@vdemeester
Copy link
Member

  • The more feature flags we add, the more bugs we aren't going to be able to catch - we'll be unable to test all the possible combinations of these flags <-- this is the main reason

Make sense indeed 👌🏼 I do agree on this.

  • This makes it easy for folks to opt into alpha features without having to think about each one individually (i.e. more of a chance that folks will try these alpha features and give feedback - this assumes we have some kind of "cutting edge" user who wants to try all the shiny new things)

Same here 👌🏼

@bobcatfish
Copy link
Contributor

@afrittoli sgtm - i think all we need to merge this PR and move forward with the proposal is to reword the requirements section? at this point i can also just go ahead as is in the interests of moving this forward

@pritidesai
Copy link
Member

assinging it to @pritidesai

@ghost
Copy link

ghost commented Feb 8, 2021

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2021
@bobcatfish
Copy link
Contributor

/approve

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2021
@tekton-robot tekton-robot merged commit 7dc19b3 into tektoncd:main Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants