Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ability to entirely opt-out of argo cd and rollouts integrations #1382

Merged
merged 10 commits into from
Jan 11, 2024

Conversation

krancour
Copy link
Member

@krancour krancour commented Jan 9, 2024

Fixes #1379

Highlights:

  • Allows Argo CD integration to be explicitly disabled for the controller. In which case:
    • Application reconciler does not start
    • Argo CD-based promotion mechanisms will error
    • Argo CD Application state cannot be factored into Stage health
  • Allows Argo Rollouts integration to be explicitly disabled for the controller. In which case:
    • AnalysisRun reconciler does not start
    • Verifications will fail with an error since they're defined using AnalysisTemplates and executed as AnalysisRuns
    • VerificationInfo has been expanded with new fields for better surfacing such errors
  • Allows Argo Rollouts integration to be explicitly disable for the API server, in which case:
    • API server cannot apply manifests for AnalysisTemplate resources

WIP because I am still addressing some of @jessesuen feedback.

Copy link

netlify bot commented Jan 9, 2024

Deploy Preview for docs-kargo-akuity-io ready!

Name Link
🔨 Latest commit 12bfb28
🔍 Latest deploy log https://app.netlify.com/sites/docs-kargo-akuity-io/deploys/659f314d61a8990008adb6d0
😎 Deploy Preview https://deploy-preview-1382.kargo.akuity.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jan 9, 2024

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (8b5a91f) 45.31% compared to head (12bfb28) 45.52%.

Files Patch % Lines
internal/controller/stages/stages.go 69.56% 7 Missing ⚠️
api/v1alpha1/stage_types.go 0.00% 6 Missing ⚠️
api/v1alpha1/zz_generated.deepcopy.go 0.00% 6 Missing ⚠️
internal/controller/promotions/promotions.go 16.66% 5 Missing ⚠️
internal/credentials/credentials.go 50.00% 3 Missing ⚠️
internal/controller/applications/applications.go 0.00% 1 Missing ⚠️
internal/controller/promotion/argocd.go 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1382      +/-   ##
==========================================
+ Coverage   45.31%   45.52%   +0.21%     
==========================================
  Files         136      136              
  Lines       11761    11823      +62     
==========================================
+ Hits         5329     5382      +53     
- Misses       6240     6250      +10     
+ Partials      192      191       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@krancour krancour force-pushed the krancour/optional-integrations branch 3 times, most recently from 53834d8 to 9df1f3a Compare January 9, 2024 23:18
@jessesuen
Copy link
Member

Instead of (or in addition to) the ROLLOUTS_INTEGRATION_ENABLED variable to indicate if integration should explicitly be enabled, could we make this inferred by ability to list analysisruns/analysistemplates CRs (even if none are found) during startup?

The reason I think this is desired is because coordinating ROLLOUTS_INTEGRATION_ENABLED across many shards will be hard to maintain and it would be better to infer this with less required upfront configuration.

I'm okay with having the option of ROLLOUTS_INTEGRATION_ENABLED be explicit (to allow things to break when things aren't expected), but in the case where it is omitted, I feel we can rely on inference for a better UX.

@jessesuen
Copy link
Member

jessesuen commented Jan 10, 2024

This is an example of how we avoid a configuration knob for Istio in Argo Rollouts by trying to detect if an Istio VirtualService is even present. If not, we don't attempt to start the informer.

https://github.com/argoproj/argo-rollouts/blob/master/utils/istio/istio.go#L16-L22

func DoesIstioExist(dynamicClient dynamic.Interface, namespace string) bool {
	_, err := dynamicClient.Resource(GetIstioVirtualServiceGVR()).Namespace(namespace).List(context.TODO(), metav1.ListOptions{Limit: 1})
	if err != nil {
		return false
	}
	return true
}

@krancour krancour force-pushed the krancour/optional-integrations branch from 9df1f3a to 75c7834 Compare January 10, 2024 20:11
@krancour krancour marked this pull request as draft January 10, 2024 20:29
@krancour krancour force-pushed the krancour/optional-integrations branch from 75c7834 to abc446c Compare January 10, 2024 20:34
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
@krancour krancour force-pushed the krancour/optional-integrations branch from abc446c to c18671d Compare January 10, 2024 20:48
Comment on lines +714 to +721
// Phase describes the current phase of the Verification process. Generally,
// this will be a reflection of the underlying AnalysisRun's phase, however,
// there are exceptions to this, such as in the case where an AnalysisRun
// cannot be launched successfully.
Phase VerificationPhase `json:"phase,omitempty"`
// Message may contain additional information about why the verification
// process is in its current phase.
Message string `json:"message,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the main changes to the resources types.

It's always been possible that something could go wrong with a verification other than an AnalysisRun failing or erroring, so really, this is the way things should have been done from the start. Added Phase and Message fields for reporting such problems.

The specific problem that is more likely to occur, starting with this PR, is that verification is requested, but the controller doesn't have Rollouts integration enabled. This means an AnalysisRun cannot even be launched.

@krancour krancour force-pushed the krancour/optional-integrations branch from c18671d to fc5b97a Compare January 10, 2024 21:06
Comment on lines +79 to +86
{{- if .Values.api.rollouts.integrationEnabled }}
- apiGroups:
- argoproj.io
resources:
- analysistemplates
verbs:
- "*"
{{- end }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes #1379

@krancour krancour force-pushed the krancour/optional-integrations branch from fc5b97a to 05afd25 Compare January 10, 2024 21:23
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
@krancour krancour force-pushed the krancour/optional-integrations branch from 05afd25 to 50a496c Compare January 10, 2024 21:26
Comment on lines +82 to +88
if a.argocdClient == nil {
return promo.Status.WithPhase(kargoapi.PromotionPhaseFailed), newFreight,
errors.New(
"Argo CD integration is disabled on this controller; cannot perform " +
"promotion",
)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't just the Application reconciler that we cannot start when Argo CD is disabled. We also cannot carry out Argo CD-based promotion mechanisms.

Comment on lines +27 to +34
if r.argocdClient == nil && len(argoCDAppUpdates) > 0 {
h.Status = kargoapi.HealthStateUnknown
h.Issues = []string{
"Argo CD integration is disabled on this controller; cannot assess" +
" the health or sync status of Argo CD Applications",
}
return &h
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot factor Application state into Stage health if Argo CD integration is disabled.

@@ -70,12 +70,12 @@ type reconciler struct {
startVerificationFn func(
context.Context,
*kargoapi.Stage,
) (*kargoapi.VerificationInfo, error)
) *kargoapi.VerificationInfo
Copy link
Member Author

@krancour krancour Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function signature changed...

If something goes wrong, we report it through the VerificationInfo's Phase and Message fields. What we don't want is for a verification problem to present as a Stage problem.

Comment on lines 20 to +23
func (r *reconciler) startVerification(
ctx context.Context,
stage *kargoapi.Stage,
) (*kargoapi.VerificationInfo, error) {
) *kargoapi.VerificationInfo {
Copy link
Member Author

@krancour krancour Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, if something goes wrong, we now report it through the VerificationInfo's Phase and Message fields. What we don't want is for a verification problem to look like a Stage problem.

argoClient client.Client // nil if credential borrowing is not enabled
cfg KubernetesDatabaseConfig
kargoClient client.Client
argocdClient client.Client // nil if credential borrowing is not enabled
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see changes like this throughout. "argoClient" was too ambiguous, because Rollouts is also and Argo project. argocdClient is more specific.

@@ -485,7 +523,7 @@ type FreightReference struct {
Charts []Chart `json:"charts,omitempty"`
// VerificationInfo is information about any verification process that was
// associated with this Freight for this Stage.
VerificationInfo *VerificationInfo `json:"verificationResult,omitempty"`
VerificationInfo *VerificationInfo `json:"verificationInfo,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oops to begin with.

Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
@krancour krancour force-pushed the krancour/optional-integrations branch from 36db30b to 12bfb28 Compare January 11, 2024 00:07
@krancour
Copy link
Member Author

Instead of (or in addition to) the ROLLOUTS_INTEGRATION_ENABLED variable to indicate if integration should explicitly be enabled, could we make this inferred by ability to list analysisruns/analysistemplates CRs (even if none are found) during startup?

The reason I think this is desired is because coordinating ROLLOUTS_INTEGRATION_ENABLED across many shards will be hard to maintain and it would be better to infer this with less required upfront configuration.

I'm okay with having the option of ROLLOUTS_INTEGRATION_ENABLED be explicit (to allow things to break when things aren't expected), but in the case where it is omitted, I feel we can rely on inference for a better UX.

@jessesuen, I addressed this in new commit 12bfb28

I think there is a benefit to explicitly disabling Argo CD and/or Rollouts integrations if they are undesired because the controller is granted slightly fewer permissions that way.

But your point about configuration for many shards being difficult is a good one. And we've also seen instances of our own colleagues being surprised at the controller going into a crash loop when they forgot to install one of the dependencies.

So where I landed was:

  • If you explicitly disable an integration, it is truly disabled without further condition. If you, for instance, had Argo CD or Argo Rollouts on a cluster and really wanted not to enable those integrations, this is what you'd have to do.

  • If you don't explicitly opt-out, there's a sanity check in case you rolled with the default by accident. So if you start up with Rollouts integration (for example) enabled, but Rollouts CRDs not installed, then you'll see a warning, but the controller will function without error, just as if that feature had been explicitly disabled.

Does all that seem ok?

Willing to tweak this further if you think I've missed the mark.

@krancour krancour marked this pull request as ready for review January 11, 2024 00:12
@jessesuen
Copy link
Member

but Rollouts CRDs not installed, then you'll see a warning, but the controller will function without error, just as if that feature had been explicitly disabled.

Yep! That is effectively the same as what I was asking for.

@krancour krancour added this pull request to the merge queue Jan 11, 2024
Merged via the queue into akuity:main with commit bfc1fd4 Jan 11, 2024
15 checks passed
@krancour krancour deleted the krancour/optional-integrations branch January 11, 2024 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rollouts scheme may need to be registered for api server
2 participants