Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osquery won't install when deployed via Elastic Agent integrations on k8s #1540

Closed
ty-elastic opened this issue Oct 11, 2022 · 11 comments · Fixed by #4925
Closed

osquery won't install when deployed via Elastic Agent integrations on k8s #1540

ty-elastic opened this issue Oct 11, 2022 · 11 comments · Fixed by #4925
Assignees
Labels
8.6-candidate bug Something isn't working needs_team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@ty-elastic
Copy link

ty-elastic commented Oct 11, 2022

Hi, I deployed Endpoint and Agent 8.4.x into the daemonset on my self-managed k8s cluster per this yaml, and then deployed osquery via Fleet Integrations.

Upon deployment, I see this error in the agent logs:

06:25:26.900
elastic_agent.osquerybeat
[elastic_agent.osquerybeat][error] Failed to run osquery:W1005 11:24:26.929394  7382 extensions.cpp:426] Will not autoload extension with unsafe directory permissions: /usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/osquerybeat-8.4.2-linux-x86_64/osquery-extension.ext
E1005 11:24:26.953727  7382 shutdown.cpp:79] Cannot activate osq_config config plugin: Unknown registry plugin: osq_config: exit status 78
06:25:26.900
elastic_agent.osquerybeat
[elastic_agent.osquerybeat][info] osquerybeat context cancelled, exiting

this suggests that osquery extensions must not have write permissions for non-privileged accounts.

Yet after install, if I ssh into the daemonset I see this:

# pwd
/usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/osquerybeat-8.4.2-linux-x86_64
# ls -l
total 431832
-rw-r--r-- 1 elastic-agent elastic-agent     13675 Sep 13 21:23 LICENSE.txt
-rw-r--r-- 1 elastic-agent elastic-agent   2571228 Sep 13 21:23 NOTICE.txt
-rw-r--r-- 1 elastic-agent elastic-agent       828 Sep 13 21:58 README.md
drwxr-xr-x 2 elastic-agent elastic-agent        23 Sep 14 22:38 certs
-rw-r--r-- 1 root          root             389399 Sep 13 21:50 fields.yml
drwxr-x--- 3 root          root                 88 Oct  5 19:30 osquery
-rwxr-xr-x 1 elastic-agent elastic-agent   6173182 Sep 13 21:58 osquery-extension.ext
-rwxr-xr-x 1 elastic-agent elastic-agent 219834144 Sep 13 21:57 osquerybeat
-rw-r--r-- 1 root          root              43600 Sep 13 21:50 osquerybeat.reference.yml
-rw-r--r-- 1 root          root               6504 Sep 13 21:50 osquerybeat.yml
-rwxr-x--- 1 elastic-agent elastic-agent 213141464 Sep 13 21:50 osqueryd

elastic-agent has write priv to osquery-extension.ext which is triggering that error.

If I chown root:root osquery-extension.ext in the elastic agent container in the daemonset, osquery works as expected.

Seems like osquery-extension.ext needs to somehow be owned by root when installed into k8s daemonset via Agent via Integrations?

@botelastic botelastic bot removed the needs_team label Oct 11, 2022
@cmacknz cmacknz added 8.7-candidate bug Something isn't working labels Oct 12, 2022
@cmacknz
Copy link
Member

cmacknz commented Oct 17, 2022

From @michalpristas initial look at this problem:

uid and gid should be configurable in the agent spec but assuming nobody is using this it get effective UID and GID from agent.
when agent is unpacked, all permissions are set to 0,0 so everything should be owned as root. it can be that in case agent is running under elastic-agent user permissions to unpacked (beats) files are somehow messed up as there owner of a file and user it's running as are not aligned.
changing owner of unpacked installations to 0,0 seems like a proper thing to do

@botelastic
Copy link

botelastic bot commented Oct 17, 2022

This issue doesn't have a Team:<team> label.

@cmacknz cmacknz transferred this issue from elastic/beats Oct 17, 2022
@cmacknz cmacknz added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team 8.6-candidate and removed 8.7-candidate labels Oct 17, 2022
@cmacknz
Copy link
Member

cmacknz commented Oct 17, 2022

We have another report of something similar happening with the files used for Beat lightweight modules when run on k8s, producing log messages like:

{"log.level":"error","@timestamp":"2022-10-14T17:03:00.903Z","log.logger":"registry.lightmodules","log.origin":{"file.name":"mb/lightmodules.go","file.line":147},"message":"Failed to list light metricsets for module uwsgi: getting metricsets for module 'uwsgi': loading light module 'uwsgi' definition: loading module configuration from '/usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/metricbeat-8.4.2-linux-x86_64/module/uwsgi/module.yml': config file (\"/usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/metricbeat-8.4.2-linux-x86_64/module/uwsgi/module.yml\") must be owned by the user identifier (uid=0) or root","service.name":"metricbeat","ecs.version":"1.6.0"}

@cmacknz
Copy link
Member

cmacknz commented Oct 17, 2022

Bumping this from 8.7 to 8.6

@jmbass
Copy link

jmbass commented Nov 10, 2022

I worked around this by adding a postStart hook to my daemonset and chowning osquerybeats.

@gizas
Copy link
Contributor

gizas commented Nov 23, 2022

@jmbass can you please provide the postStart sample you used ? We might need to provide this workaround to a customer of ours

@jmbass
Copy link

jmbass commented Nov 23, 2022

@gizas On the daemonset/deployment template spec:

...
containers:
- name: elastic-agent
  image: docker.elastic.co/beats/elastic-agent:8.4.3
  lifecycle:
    postStart:
      exec:
        command: ["/bin/bash", "-c", "chown -R root:root /usr/share/elastic-agent/data/*/install/osquerybeat-8.4.3-linux-x86_64/"]
etc...

Be wary to use the right osquerybeat version for your elastic agent in the command.

@ty-elastic
Copy link
Author

for the postStart hook, you could wildcard the path (presumably there would only be one version in that directory?). notably, this does require a restart of the elastic-agent container after osquerybeat install. I guess if you wanted to be really hacky, you could try to have this run periodically regardless of whether elastic-agent is installed.

@ThorbenJ
Copy link

ThorbenJ commented Jul 9, 2024

Just doing a bit our DYI sleuthing:

  • The elastic-agent container is built with the assumption that elastic-agent runs as user elastic-agent / UID:1000, and so all files have that owner.
  • The k8s manifest generated/provided by Kibana add agent wizard contains a "runAs: 0" clause. This is needed by various integrations (such as the security ones) - so running elastic-agent as root is standard practice
  • As elastic-agent is UID:0, so are its children - the integrations it manages
  • Therefore osquery manager starts as UID:0, sees its config files being owned by UID:1000 - and recognises the potential privilege escalation (User UID:1000 can change the config of a UID:0 service)
  • As osquery doesn't know or care its in a container, it rightly quit to prevent any potential for abuse

Using postStart has no timing guarantees, agent will have already started potentially osquery will have also already started and died before the postStart hook is scheduled to run.

A more robust work around would be to add a new startup/entrypoint script via a configmap, to correct the file permissions before agent has the chance to start:

ConfigMap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-agent-k8s-scripts
data:
  pre-entrypoint.sh: |
    #!/bin/sh
    chown -R root:root /usr/share/elastic-agent
    exec /usr/local/bin/docker-entrypoint "$@"

Now edit agents mounts to include the config map:

          ...
          volumeMounts:
           ...
            - name: extra-scripts
              mountPath: /var/local/pre-entrypoint.sh
              subPath: pre-entrypoint.sh
              readOnly: true
      volumes:
       ...
        - name: extra-scripts
          configMap:
            name: elastic-agent-k8s-scripts
            defaultMode: 0754
       ...

Then change the container start command:

     ...
     containers:
        - name: elastic-agent-k8s
          image: docker.elastic.co/beats/elastic-agent:8.12.2
          command: ['/var/local/pre-entrypoint.sh']
      ...

The long term fix, I think, might be for the container to be built using the owner UID:0 to match how elastic-agent will run.

@aleksmaus
Copy link
Contributor

Just to document some of the DM conversations:

It looks like there is some miscommunication between the teams.
Some are assuming that the agent always runs under unprivileged user in k8s. The others, namely Security, require the the agent running as root.
The current agent files have a mix of owners: elastic-agent and root. The *.yml files are specifically set to be owned by root.
https://github.com/elastic/elastic-agent/blob/main/dev-tools/packaging/templates/docker/Dockerfile.elastic-agent.tmpl#L148

It looks like we need some consistent approach and the story for our users. And if we are to support both scenarios, this should be documented, since, as of now, the instructions on Kibana will lead to broken osquery integration.

Screenshot 2024-07-09 at 9 22 55 AM

@cmacknz
Copy link
Member

cmacknz commented Jul 17, 2024

This will be resolved by #4925

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.6-candidate bug Something isn't working needs_team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants