Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert fleet-agent to controller-runtime #1734

Closed
manno opened this issue Aug 23, 2023 · 2 comments
Closed

Convert fleet-agent to controller-runtime #1734

manno opened this issue Aug 23, 2023 · 2 comments
Assignees
Milestone

Comments

@manno
Copy link
Member

manno commented Aug 23, 2023

We discussed wrangler vs controller-runtime multiple times. How feasible is it to switch frameworks?

Fleet-agent is relatively limited. When migrating to controller-runtime:

  • how easy is it to port the different "timers", time.After,EnqueueAfter
  • are we missing something when we drop apply, e.g. cleanup, updates
  • agent sets up dynamic watches on deployed resources, ?
  • should we extract existing "skip" logic into predicates/event filters first or are wrangler handlers too different?
  • will probably need a separate go.mod

If it's not possible to move the "monitor" part of the agent, because of timers and internal/cmd/agent/trigger/watcher.go, does it help to split the agent, so we can at least port the deployer?


The fleet-agent controller was split into three processes:

  • registration
  • cluster status updater
  • fleet-agent controller, watches bundledeployments

Only the last part is an actual controller. Cluster status updates work on a timer and registration is done when the agent starts up. A new function registration.Get was added, to just return an existing registration to the other processes.

The agent to a statefulset, to avoid leader election. The code of the handlers was spread between the "handler", the deploy "manager" and the helm deployer. It has been consolidated into the deployer package.

"Trigger" was renamed to driftdetect as it uses the helm history to detect drift for deployments. Which is similar to themonitor, which updates the bundle deployments status.

After splitting the agent, the fleet-agent controller was converted to controller-runtime:

  • new bundledeployment controller + reconciler
  • agent reconciler only sets status when exiting reconcile
  • when deployed resources change, re-trigger bundledeployment via status update
  • simplify bundlestatus condition handling
  • switch to zap logging
@manno manno added this to Fleet Aug 23, 2023
@manno manno converted this from a draft issue Aug 23, 2023
@manno manno changed the title Draft: Spike Migrate fleet-agent to kubebuilder Migrate fleet-agent to kubebuilder Aug 24, 2023
@manno manno removed the kind/spike label Aug 24, 2023
@manno manno added this to the 2023-Q4-v2.8x milestone Aug 28, 2023
@manno manno self-assigned this Sep 4, 2023
@manno manno moved this from 📋 Backlog to 🏗 In progress in Fleet Sep 4, 2023
@aruiz14
Copy link
Contributor

aruiz14 commented Sep 11, 2023

After chatting about this, and how we are using leader election, I was wondering if it is really necessary.
At the moment, it's used to prevent multiple instances of the fleet-agent (although it's deployment specifies only 1 replica), to act on the same BundleDeployment resource, ending up with repeated work or conflicts in the worst case.

I believe we can use an alternative approach based on optimistic locking, hence eliminating the restriction of having one single active instance:

  1. BundleDeployment status reflects that an instance is already processing it (e.g. a new in progress condition).
  2. When an instance of fleet-agent observes an actionable BundleDeployment, and tries to update the status sub-resource to in progress.
    • If it was already updated by another instance, Kubernetes will reject the change due to different resource version, should abort any further operation on that resource.
    • If it succeeds, it can continue with the deployment.
  3. In the event of stale "in progress" BundleDeployment(e.g. the fleet-agent instance died in the middle of a deployment and it was not able to recover its progress), the fleet-controller must apply some expiration period and decide to mark it as an error or unfreeze the object by removing the in-progress condition.

This is actually not tied to the use of controller-runtime, but I thought I should bring it up here since it may be relevant to your current work.

@manno
Copy link
Member Author

manno commented Sep 11, 2023

@manno manno modified the milestones: v2.8.0, 2024-Q1-2.8x Oct 5, 2023
@kkaempf kkaempf moved this from 🏗 In progress to Blocked in Fleet Oct 9, 2023
@manno manno moved this from Blocked to 🏗 In progress in Fleet Nov 13, 2023
@manno manno modified the milestones: 2024-Q1-2.8x, v2.9.0 Nov 27, 2023
@manno manno moved this from 🏗 In progress to 👀 In review in Fleet Nov 27, 2023
@manno manno changed the title Migrate fleet-agent to kubebuilder Conver fleet-agent to kubebuilder Jan 11, 2024
@manno manno changed the title Conver fleet-agent to kubebuilder Convert fleet-agent to kubebuilder Jan 11, 2024
@manno manno changed the title Convert fleet-agent to kubebuilder Convert fleet-agent to controller-runtime Jan 11, 2024
@manno manno moved this from 👀 In review to ✅ Done in Fleet Jan 11, 2024
@weyfonk weyfonk closed this as completed Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants