Convert fleet-agent to controller-runtime #1734

manno · 2023-08-23T12:09:57Z

We discussed wrangler vs controller-runtime multiple times. How feasible is it to switch frameworks?

Fleet-agent is relatively limited. When migrating to controller-runtime:

how easy is it to port the different "timers", time.After,EnqueueAfter
are we missing something when we drop apply, e.g. cleanup, updates
agent sets up dynamic watches on deployed resources, ?
should we extract existing "skip" logic into predicates/event filters first or are wrangler handlers too different?
will probably need a separate go.mod

If it's not possible to move the "monitor" part of the agent, because of timers and internal/cmd/agent/trigger/watcher.go, does it help to split the agent, so we can at least port the deployer?

The fleet-agent controller was split into three processes:

registration
cluster status updater
fleet-agent controller, watches bundledeployments

Only the last part is an actual controller. Cluster status updates work on a timer and registration is done when the agent starts up. A new function registration.Get was added, to just return an existing registration to the other processes.

The agent to a statefulset, to avoid leader election. The code of the handlers was spread between the "handler", the deploy "manager" and the helm deployer. It has been consolidated into the deployer package.

"Trigger" was renamed to driftdetect as it uses the helm history to detect drift for deployments. Which is similar to themonitor, which updates the bundle deployments status.

After splitting the agent, the fleet-agent controller was converted to controller-runtime:

new bundledeployment controller + reconciler
agent reconciler only sets status when exiting reconcile
when deployed resources change, re-trigger bundledeployment via status update
simplify bundlestatus condition handling
switch to zap logging

The text was updated successfully, but these errors were encountered:

aruiz14 · 2023-09-11T11:34:55Z

After chatting about this, and how we are using leader election, I was wondering if it is really necessary.
At the moment, it's used to prevent multiple instances of the fleet-agent (although it's deployment specifies only 1 replica), to act on the same BundleDeployment resource, ending up with repeated work or conflicts in the worst case.

I believe we can use an alternative approach based on optimistic locking, hence eliminating the restriction of having one single active instance:

BundleDeployment status reflects that an instance is already processing it (e.g. a new in progress condition).
When an instance of fleet-agent observes an actionable BundleDeployment, and tries to update the status sub-resource to in progress.
- If it was already updated by another instance, Kubernetes will reject the change due to different resource version, should abort any further operation on that resource.
- If it succeeds, it can continue with the deployment.
In the event of stale "in progress" BundleDeployment(e.g. the fleet-agent instance died in the middle of a deployment and it was not able to recover its progress), the fleet-controller must apply some expiration period and decide to mark it as an error or unfreeze the object by removing the in-progress condition.

This is actually not tied to the use of controller-runtime, but I thought I should bring it up here since it may be relevant to your current work.

manno · 2023-09-11T16:08:02Z

I mentioned this doc about wrangler handlers: https://github.com/rancher/rancher/wiki/Tips-for-Writing-Rancher-Handlers#why-was-my-object-enqueued

manno added this to Fleet Aug 23, 2023

manno converted this from a draft issue Aug 23, 2023

github-actions bot added team/fleet labels Aug 23, 2023

manno added kind/spike and removed area/fleet labels Aug 23, 2023

manno changed the title ~~Draft: Spike Migrate fleet-agent to kubebuilder~~ Migrate fleet-agent to kubebuilder Aug 24, 2023

manno removed the kind/spike label Aug 24, 2023

manno added this to the 2023-Q4-v2.8x milestone Aug 28, 2023

manno self-assigned this Sep 4, 2023

manno moved this from 📋 Backlog to 🏗 In progress in Fleet Sep 4, 2023

manno mentioned this issue Sep 11, 2023

Convert fleet-agent to controller-runtime #1772

Merged

11 tasks

kkaempf added the kind/chore label Oct 4, 2023

manno modified the milestones: v2.8.0, 2024-Q1-2.8x Oct 5, 2023

kkaempf moved this from 🏗 In progress to Blocked in Fleet Oct 9, 2023

manno moved this from Blocked to 🏗 In progress in Fleet Nov 13, 2023

manno modified the milestones: 2024-Q1-2.8x, v2.9.0 Nov 27, 2023

manno moved this from 🏗 In progress to 👀 In review in Fleet Nov 27, 2023

manno changed the title ~~Migrate fleet-agent to kubebuilder~~ Conver fleet-agent to kubebuilder Jan 11, 2024

manno changed the title ~~Conver fleet-agent to kubebuilder~~ Convert fleet-agent to kubebuilder Jan 11, 2024

manno mentioned this issue Jan 11, 2024

Convert Git/Targeting controller to controller-runtime #1972

Closed

3 tasks

manno changed the title ~~Convert fleet-agent to kubebuilder~~ Convert fleet-agent to controller-runtime Jan 11, 2024

manno mentioned this issue Jan 11, 2024

Spike: Reduce agent footprint #1733

Closed

2 tasks

manno moved this from 👀 In review to ✅ Done in Fleet Jan 11, 2024

weyfonk closed this as completed Feb 14, 2024

manno mentioned this issue Mar 4, 2024

Add trace logging to handlers #1730

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert fleet-agent to controller-runtime #1734

Convert fleet-agent to controller-runtime #1734

manno commented Aug 23, 2023 •

edited

Loading

aruiz14 commented Sep 11, 2023

manno commented Sep 11, 2023

Convert fleet-agent to controller-runtime #1734

Convert fleet-agent to controller-runtime #1734

Comments

manno commented Aug 23, 2023 • edited Loading

aruiz14 commented Sep 11, 2023

manno commented Sep 11, 2023

manno commented Aug 23, 2023 •

edited

Loading