-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky Test] TestFleetManagedUpgrade fails waiting the watcher to start #3760
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Looking in the state.yaml file at the upgrade details we can see it is stuck in the UPG_DOWNLOADING stage: upgrade_details:
action_id: ace1e80d-5a9f-4156-ae40-84db7763c6bf
metadata: {}
state: UPG_DOWNLOADING
target_version: 8.12.0-SNAPSHOT It looks like the upgrade watcher isn't starting because the download is failing so we never get to the point where the watcher is launched: {"log.level":"info","@timestamp":"2023-11-13T20:49:52.133Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":250},"message":"download attempt 7","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-11-13T20:49:52.134Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":260},"message":"unable to download package: 3 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-6645eb/downloads/elastic-agent-8.12.0-SNAPSHOT-linux-arm64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-6645eb/downloads/elastic-agent-8.12.0-SNAPSHOT-linux-arm64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"http://127.0.0.1:35295/downloads/beats/elastic-agent/beats/elastic-agent/elastic-agent-8.12.0-SNAPSHOT-linux-arm64.tar.gz\": dial tcp 127.0.0.1:35295: connect: connection refused\n\t* fetching package failed: Get \"http://127.0.0.1:35295/downloads/beats/elastic-agent/beats/elastic-agent/elastic-agent-8.12.0-SNAPSHOT-linux-arm64.tar.gz\": dial tcp 127.0.0.1:35295: connect: connection refused\n\n; retrying (will be retry 7) in 1m25.185118863s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"} It is trying to download the artifact from localhost from what I see:
This is exactly what it was configured to do looking at the pre-config.yaml agent:
download:
sourceURI: http://127.0.0.1:35295/downloads/beats/elastic-agent/ The test itself could probably start watching the upgrade details information on main instead of looking for the watcher to start, that would give us a much better error message. This block could instead use the status command to poll for the UPG_WATCHING state. elastic-agent/testing/integration/upgrade_fleet_test.go Lines 143 to 147 in f7dcbd7
|
I also suspect there is a bug in the upgrade details here, the download failing should have put us in the upgrade failed state. Maybe we just needed to wait longer for that to happen though, will open a separate issue if I can confirm that is a problem. |
We retry the download step indefinitely. As such we never enter the But perhaps we should enter the |
Yeah the default timeout is 2 hours so I think we would have gotten to UPG_FAILED eventually. elastic-agent/internal/pkg/agent/application/upgrade/artifact/config.go Lines 158 to 161 in 1050aee
elastic-agent/internal/pkg/agent/application/upgrade/step_download.go Lines 238 to 243 in 1050aee
I think in real life this is fine. I think we should keep the UPG_FAILED state as a terminal state that means the upgrade won't succeed without being retried by the user. If we reset from UPG_FAILED to UPG_DOWNLOADING it would make this inconsistent. The user should see the UPG_DOWNLOADING error even if it isn't in the UPG_FAILED state. I do think the UPG_DOWNLOADING needs a place we can store an error message from the last unsuccessful retry though, otherwise users won't see the actual progress. Just looking at the following wasn't enough to determine the error, I had to look at the logs and I assume it will be similar in the UI unless something is missing from the state.yaml specifically: upgrade_details:
action_id: ace1e80d-5a9f-4156-ae40-84db7763c6bf
metadata: {}
state: UPG_DOWNLOADING
target_version: 8.12.0-SNAPSHOT |
Agreed. I have closed #3769 unmerged.
We could add a |
closed, it was caused by #3724 |
I like We could just use |
Flaky Test
Stack Trace
The text was updated successfully, but these errors were encountered: