-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic Agent] Investigate running all beats through the system process manager #18362
Comments
Just added the beta1 label, if we goe this route I believe its the right timing. |
Yes I have been looking into to this. On each OS it would be a little different so we can create a interface that works for each build of the OS. Something like below would work.
In the Another option could be to have an agent container talk through OCI, docker, or Kubernetes to spawn the containers on the actual platform, so it could be that Elastic Agent is running in a Pod then it spawns another Pod to run filebeat and another Pod to run metricbeat inside of Kubernetes. Then it would be kubernetes job to ensure that the Pod stays running and not Agent itself. Which if you think about it could be a great way to start monitoring a Kubernetes. Deploy a single Agent then it will grow the filebeats and metricbeats as needed. |
@blakerouse This look indeed like a good plan, concernig the k8s / docker scenario as proposed we might now need it now. I am worried about the management of mount point for the pods. The problem I see with this, we start multiples instances of Metricbeat/Filebeat at the moment and I don't think that would fit well in the Systemd, would that beat different installed services? ie including @ruflin and @michalpristas |
@ruflin Would this align with what you had in mind for autodiscovery?
|
I like the overall idea especially as it aligns that all processes are run the same. Will the above mean it will not be possible to run more then once instance of the agent in one OS? I wonder if it is a feature or a bug that the sub process keep running if the agent dies. Will they reconnect as soon as the agent is available again? For Docker / K8s: I don't think it is related to how autodiscovery will work but I probably miss something here. +1 on focusing first on all the other OS / deployment models. Will this fully replace the "process" model or can the user choose to still use the process model if he does not want to use systemd for example? |
we were discussing with blake slow onboarding to services in a way that e.g endpoint can be a service and beats processes ... it should be just a different implementation of the same interface and with this in mind we can make it even configurable as you say. also with services we dont need to keep track of processes we know the state of the service and we can start/stop when needed with grpc flow flipped process will query for configuration periodically so it should be much more simple regarding the flow |
@michalpristas @blakerouse overall that would simplify the problem, I am assuming we will have some bookkeeping to do on the agent side to detect missing process? I am assuming that we will monitoring the output of Another thing, we could also better restrict users on the process based on that strategy. |
Well, when the Agent manages the process and get kill there is a rippled effect and the dependent process should be cleaned. If we delegate that to the system, this mean that we could have zombies processes sending data to elasticsearch. This could probably be solved by having a grace period, If I cannot reach the Agent after X amount of time I should suspend myself and wait for agent to come back. |
After some investigations it seems that as uniform this would for beats and endpoint this will open a security issue, with getting the certificates and unique token to the application. Staying with subprocesses and having endpoint be unique is the best scenario for now. Closing this issue. |
Currently Elastic Agent runs
filebeat
andmetricbeat
as a subprocesses, but with the addition ofendpoint
agent needs to manage that through the system process manager (systemd, services.msc, launchctl).Instead of making
endpoint
a one off case, I propose we look at running them all through the system process manager. There is benefits to doing this:filebeat
,metricbeat
,endpoint
, and more in the future will keep on running, so no events will be lost.In the case of the Elastic Agent container the first process can be systemd that spawns Elastic Agent and then all the beats will be managed by that systemd init process.
The text was updated successfully, but these errors were encountered: