Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker][agentbeat]: add cap_sys_ptrace and cap_dac_override in permitted set #5271

Closed

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Aug 8, 2024

What does this PR do?

Grant cap_sys_ptrace and cap_dac_override to permitted set.
Required to operate in unprivileged mode.

This PR also updates k8s standalone tests and adds required permissions and disables the long running test case #5279.
I'm including all of these changes in the same PR because they are related to a same problem.

Why is it important?

  • Without these capabilities, agent will always remain in DEGRADED state inside the container

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

  • PACKAGES=docker mage package on this branch
docker run --entrypoint bash --user root --rm -it docker.elastic.co/beats/elastic-agent:8.16.0-SNAPSHOT
getcap data/elastic-agent-a1fd2c/components/agentbeat
data/elastic-agent-a1fd2c/components/agentbeat = cap_dac_override,cap_setuid,cap_net_raw,cap_sys_ptrace+p

Related issues

@VihasMakwana VihasMakwana added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Aug 8, 2024
@VihasMakwana VihasMakwana self-assigned this Aug 8, 2024
@VihasMakwana VihasMakwana requested a review from a team as a code owner August 8, 2024 20:39
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@VihasMakwana VihasMakwana requested a review from blakerouse August 8, 2024 20:39
Copy link
Contributor

mergify bot commented Aug 8, 2024

This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip label Aug 8, 2024
@VihasMakwana VihasMakwana added backport-8.15 Automated backport to the 8.15 branch with mergify and removed backport-skip labels Aug 8, 2024
@VihasMakwana
Copy link
Contributor Author

The CI is expected to fail until elastic/beats#40466 gets merged and released as a snapshot.

@ycombinator
Copy link
Contributor

ycombinator commented Aug 9, 2024

Hi @VihasMakwana, just to confirm, are the changes in this PR intended to fix the following k8s integration test currently failing in CI?

=== FAIL: testing/integration TestKubernetesAgentStandalone/drop_ALL_add_CHOWN,_SETPCAP_capabilities_-_rootless_agent (99.69s)
  | kubernetes_agent_standalone_test.go:277: elastic-agent never reported healthy: command terminated with exit code 1
  | kubernetes_agent_standalone_test.go:278: stdout: ┌─ fleet
  | │  └─ status: (STOPPED) Not enrolled into Fleet
  | └─ elastic-agent
  | ├─ status: (DEGRADED) 1 or more components/units in a degraded state
  | └─ system/metrics-default
  | ├─ status: (HEALTHY) Healthy: communicating with pid '81'
  | └─ system/metrics-default-system-metrics
  | └─ status: (DEGRADED) Error fetching data for metricset system.process: Not enough privileges to fetch information: Not enough privileges to fetch information: non fatal error fetching PID some info for 13, metrics are valid, but partial: non-fatal error fetching PID metrics for 13, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/13/io: permission denied
  |  
  | kubernetes_agent_standalone_test.go:279: stderr:
  | kubernetes_agent_standalone_test.go:347: Wrote container elastic-agent-standalone of pod elastic-agent-standalone-xmt88 logs to /opt/buildkite-agent/builds/bk-agent-prod-gcp-1723161759733813737/elastic/elastic-agent/build/k8s-logs-v1.29.4-kubernetes-amd64--kubernetes.integration/TestKubernetesAgentStandalone/drop_ALL_add_CHOWN,_SETPCAP_capabilities_-_rootless_agent-elastic-agent-standalone-xmt88-elastic-agent-standalone.log
  | kubernetes_agent_standalone_test.go:347: Wrote container kube-state-metrics of pod kube-state-metrics-5464659c6f-869q8 logs to /opt/buildkite-agent/builds/bk-agent-prod-gcp-1723161759733813737/elastic/elastic-agent/build/k8s-logs-v1.29.4-kubernetes-amd64--kubernetes.integration/TestKubernetesAgentStandalone/drop_ALL_add_CHOWN,_SETPCAP_capabilities_-_rootless_agent-kube-state-metrics-5464659c6f-869q8-kube-state-metrics.log

[EDIT] Never mind, I see now that you need this PR and elastic/beats#40466 both to fix these tests. Created #5275 to track the test failure and mentioned both PRs in there.

@pierrehilbert
Copy link
Contributor

/test

@@ -531,7 +531,7 @@
}
},
{
"enabled": true,
"enabled": false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these being disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michel-laterman i actually forgot to update the description. my bad

Please take a look now.

@VihasMakwana
Copy link
Contributor Author

/test

Copy link

@pierrehilbert
Copy link
Contributor

Test failure is caused by #4215

@@ -57,7 +57,7 @@ RUN true && \

# Keep this after any chown command, chown resets any applied capabilities
RUN setcap =p {{ $beatHome }}/data/elastic-agent-{{ commit_short }}/elastic-agent
RUN setcap cap_net_raw,cap_setuid+p {{ $beatHome }}/data/elastic-agent-{{ commit_short }}/components/agentbeat && \
RUN setcap cap_net_raw,cap_setuid,cap_sys_ptrace,cap_dac_override+p {{ $beatHome }}/data/elastic-agent-{{ commit_short }}/components/agentbeat && \
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dac_override is way too open why dac_read_search is not adequate?

@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Aug 13, 2024

is this PR really needed? given there is this one

@cmacknz
Copy link
Member

cmacknz commented Aug 13, 2024

Vihas is out on PTO. I am going to pre-emptively close this assuming we don't need it and we can revisit later if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants