-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agent] Elastic-Agent-Packaging-Linux is failed (there for e2e-testing pr support) which blocks merges - seems to be a Heartbeat container build problem #28570
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
@andresrc @jlind23 @KseniaElastic @v1v can can we do to help resolve this quick? I don't know the Beats packaging. |
@andrewvc seems that this is on Synthetics side. |
I'm currently bisecting this issue with the e2e-testing framework. For that, I'm basically passing the commit to the tests with this command: For master (current released artifacts): TAGS="heartbeat" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true make -C e2e/_suites/kubernetes-autodiscover functional-test Starting the bisect with latest commit on heartbeat: TAGS="heartbeat" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true GITHUB_CHECK_SHA1=af602c2b0df38bfc3fb5cfcfabcab1145b558022 ELASTIC_APM_ACTIVE=false make -C e2e/_suites/kubernetes-autodiscover functional-test Will post results here. |
OK, taking this commit 99ebf3e as And the result is GOOD ✅ kind delete clusters --all && \
TAGS="heartbeat" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true \
GITHUB_CHECK_SHA1=99ebf3e4375c4dbee0ee281889e804b13a62a463
ELASTIC_APM_ACTIVE=false make -C e2e/_suites/kubernetes-autodiscover functional-test List of commits, starting from GOOD:
|
Trying with 298d786: kind delete clusters --all && \
TAGS="heartbeat" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true \
GITHUB_CHECK_SHA1=298d786fc67301f429c0fe619fa06787093a6751 \
ELASTIC_APM_ACTIVE=false make -C e2e/_suites/kubernetes-autodiscover functional-test And the result is BAD 🔴 |
Trying with 53a618b: docs: link to new APM book kind delete clusters --all && \
TAGS="heartbeat" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true \
GITHUB_CHECK_SHA1=53a618b36135db5c2940e1df48ffad164349b28c \
ELASTIC_APM_ACTIVE=false make -C e2e/_suites/kubernetes-autodiscover functional-test And the result is BAD 🔴 |
Trying with 0a24250: [pre-commit] for linting merge-conflict, pipelines and JJBB kind delete clusters --all && \
TAGS="heartbeat" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true \
GITHUB_CHECK_SHA1=0a2425021bfb488cf21927443bbfafc0ec450bb7 \
ELASTIC_APM_ACTIVE=false make -C e2e/_suites/kubernetes-autodiscover functional-test And the result is GOOD ✅ |
The only commit that is left in this bisect is 81c38fc: [Heartbeat][Agent] Seccomp / synthetics bugfix improvements, which should fail the tests, being the culprit commit 🤞 kind delete clusters --all && \
TAGS="heartbeat" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true \
GITHUB_CHECK_SHA1=81c38fc4c009348d57c92ae85920aed35297a89e \
ELASTIC_APM_ACTIVE=false make -C e2e/_suites/kubernetes-autodiscover functional-test And, effectively, the result is BAD 🔴, which means that we found the root cause of the issue. |
Let me explain what that test does: it uses the k8s-autodiscover test suite (@jsoriano and @ChrsMark can provide more context): Scenario: Monitor pod availability using hints with named ports
Given "heartbeat" is running with "hints enabled for pods"
When "redis" is deployed with "monitor annotations with named port"
Then "heartbeat" collects events with "kubernetes.pod.name:redis"
And "heartbeat" collects events with "monitor.status:up" The scenario always fails in the same step, the To understand the internals, the scenario creates a Kubernetes cluster with Kind, and starts a pod from Hearbeat in the version specified by the As described in the
Where kubectl returns |
I'm checking that the only logs we are storing are kind logs, but not the cluster logs. In my local execution, meanwhile we provide the test framework to extract kind's logs (using
|
After debugging the issue with the team:
After testing it locally adding that variable in tests, for the same culprit commit:
@EricDavisX I'll postpone the resolution until the Beats team tell us that applying that variable at test time is safe |
resolved and no complaints so let's close it |
This is from a dev via slack:
PR 'tests' are failing...
"The E2E agent linux tests are super flaky at the moment and preventing us from making imports. Is that something currently being looked at?"
citing this PR as example:
#28517
Manu helped bisect the pr failures and seemed to point towards the Heartbeat container not being available, so maybe a build problem there.
Victor noted that the e2e-testing support had been turned on (requiring that packaging) somewhat recently. PM me for slack convo link or team thread on the packaging change
image from slack:
The text was updated successfully, but these errors were encountered: