-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elemental-system-agent vs. rancher-system-agent -- there's only place for one #86
Comments
And where does the install comes from? Is it part of the bootstrap plan? Is there a way to modify that bootstrap to not install the agent and just reuse what is on the system? |
Yeah, good questions: I think unveiling them could be part of this card. For the last question ("Is there a way to modify that bootstrap to not install the agent and just reuse what is on the system?" ) from my early investigations I think we can keep our own installation through some configuration drop in file (it seems enough the rancher-system-agent is pointed to our elemental-system-agent configuration). Something anyway that should re-checked and checked better 😅 |
I had a look the other day at this and saw:
So the problem here is that we need a rancher-system-agent to apply the bootstrap scripts, which installs rancher-system-agent Maybe a quick solution here is to avoid using the boostrap on the operator, and as part of the installed system cloud-config download and execute the boostrap.sh script from rancher so we end up with the proper rancher-system-agent service and files installed. I.E. stop using the elemental-system-agent completely and rely on the rancher-system-agent completely? |
There is also something to take into account and that is that the agent version on rancher settings does not have to match to what we have:
So we may have 0.2.9 in our image but the boostrap script may install a different version :O |
I don't think this is actually being used. From what I've seen, it's version |
This depends on the rancher version so we should be careful when bumping it. I.e. 2.6.5 comes with 0.2.7 (https://github.com/rancher/rancher/blob/v2.6.5/package/Dockerfile#L61) while 2.6.6 comes with 0.2.8! |
And yeah, its being used, the asses for that version are bundled on the rancher image, so that is where the boostrap picks them. I was just using an old rancher version locally :D |
Yeah, this makes sense to me. As the rancher-system-agent should be synced up with the Rancher Manager instance running seems to me may be better to just rely on the script. |
After having a look, I start to change a bit my mind, probably it is the intended procedure having two agents running. I see the bootstrap script ensures the rancher-system-agent.service is stopped before actually starting to install the rancher-system-agent.service. So I have the feeling that the rancher-system-agent is somehow expected to be reinstalled using this script, this is also a way to sync versions with the rancher manager. So my take is, probably what we need is to ensure the first plan kills any previous installation, but still use the on image installation to run the very first plan. I think I'll quickly try a couple of options:
@Itxaka @fgiudici have you thought or tried something like that? |
I had tried the second option (having only rancher-system-agent on the machine) but it didn't make any difference. |
Nope! maybe @kkaempf tested the second option and that is why he had to rename the agent, so they didnt collide? |
There are also reports from Andrew (in Slack :-/ ) |
elemental-system-agent also has a "Condition" in its |
I am also testing the second option now and I have something incomplete. Service runs the |
Ohhh boy. Its a bti complicated. Like the rancher-system-agent has nothing to do with that, it only boostraps the node with whatever plan you have. The k3s, rke,rke2 stuff is done as part of creating a cluster kind on k8s that has a selector (machineinventoryselector) that matches the node we want to bootstrap.
I hope this was useful. I may also be a bit wrong in some places, dont get my full word for it :D |
If you already got the rancher-system-agent running and connected to k8s wiating for plans, you got it working :) |
Definitely, this was the easy part.
What I found so far is that this plan is not executed by To me it feels like there are two agents listening to different end points. I am wondering now if OS upgrades are also managed by some sort of plan... |
That is very weird because on my system is the other way around! elemental-system-agent:
The it just keeps going and going on checking and seeing the same plan, so it does nothing. rancher-system-agent:
Then it keeps going with the install and so on until everything is installed and goes back to check the plan. I wonder if its a race condition...I guess they are both watching the same namespaces for secrets??? |
As far as I understood they both use the same namespace but look at different secrets. I just verified I can deploy a cluster without running the |
yep, you are right! Thats another mystery solved! rancher:
elemental:
Which makes sense, first we watch the namespace for the machine only and then for the plans. |
My question now is, do we need both all the time? Is the elemental system agent end point required after the bootstrap phase? or is the rancher system agent going to be used at all? It is unclear to me who is feeding this plans. I guess this is some rancher manager magic, but not sure how much the elemental-operator controller is involved in there. That's why my concern now is how elementalOS upgrades are expected to happen, I'd expect them to be executed by plan, a plan pulled from some where. |
I dont think we need both, in fact the only reason the "elemental" one was introduced was for the bootstrap issues that @kkaempf encountered, otherwise the normal agent should be running at all times.
Isn there a system-upgrade-controller that should take care of that already? @mudler any idea? I know you love your k8s upgrades :P |
Hmm, maybe both are useful. One for per-cluster plans (rancher-system-agent) and the other for per-machine-plans (elemental-system-agent) ?! 🤯 |
Exactly this is my main question now. Are both looking for different plans with a different scope? or the default setup of the rancher-system-agent is just looking at the wrong place? In any case the option I tried as
won't ever be functional as this stops the former agent service (before the provisioning plan is created|evaluated), overwrites the configuration of the agent with different values and restarts the agent. The running agent at this point will not fetch the provisioning plan anymore... So, for now, I am more in the mood of keeping both or just the elemental-system-agent. |
Ok, finally I do have a better picture of how these to relate. I realized I did a wrong tests last week, in fact, as @Itxaka said, So Also I saw that until the machineInventorySelector is not in ready state the regular rancher provisioning plan does not kick-in. I could not eventually see where the exact link is however this is a requirement. I faced that while trying to merge the two system-agent processes, aka trying to enforce one updates the other, this won't work as the bootstrap scripts stops any running system agent instance and system-agent does not gracefully stop, so any on going plan will not be marked as applied, hence the machineInventorySelector will not reach the ready state. For the same reason adding a simple IMHO in order to fade out In addition I am confident we need the I also could not see a way to configure system-agent to poll multiple secrets, so multiple plans pools require multiple system agent processes. So all in all I think we just need to agree if we want to keep |
I think this was done on purpose to be able to inject anything to the machine even if its not part of a cluster yet. And based on that we take advantage and do the bootstrap part of installing the rancher-system-agent From Jacobs patch:
I would say not. But again how can we stop it properly so it makes the bootstrap plan mark as applied? Because I think that unless its rancher-system-agent the one to stop it, we cannot stop it from the plan itself becuase it will kill itself and never mark the plan as applied. Oneshot service that only runs once? Does the sentinel mode even finishes after executing one plan? |
Maybe we are able to abuse local plans to stop the service from the rancher-system-agent? If we set a local plan that just stops the elemental-system-agent and put into the rancher-system-agent dir, instead of the elemental one, once rancher-system-agent starts it will apply that plan and stop the elemental-system-agent gracefully. By that time elemental-system-agent should have stopped applying the plan and marked as applied? |
Interestingly enough a local plan works as expected:
But the problem is that we have no control over the agent config that is installed via the bootstrap script, and that has the localPlans disabled by default. Might be nice to be able to override it somehow to enable the localPlans.... |
oh nice |
rancher-system-agent is configured by default without local plans enabled... I'd leave it as is I think |
Yes, to me it feels that the |
FYI, with a simple env var change in the operator it can enable local plans. Plus a very 2 lines simple plan on elemental it all works. elemental starts, bootstraps, marks the plan as applied and launches the rancher agent and that one executes the plan which is just a systemctl stop and that turns out to shut down the elemental agent. So if we want to fully disable it after bootstrap that is the way to go, all really "simple" to call it somehow :D |
elemental-operator register
writes config forelemental-system-agent
(built from github.com/rancher/system-agent)The only thing elemental-system-agent does is to download and install
rancher-system-agent
(built, you guessed it, from github.com/rancher/system-agent)🤦♂
We should understand what kind of configs elemental-operator writes, change them to be rancher-system-agent compatible and only start rancher-system-agent. See also #60
Background info: The "official" (?) way to download and start rancher-system-agent is a shell script, provided by the management cluster:
curl -fL https://<rancher-url>/system-agent-install.sh
elemental-operator register
do ?The text was updated successfully, but these errors were encountered: