Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test]: TestLogIngestionFleetManaged – Condition never satisfied #5631

Closed
blakerouse opened this issue Sep 30, 2024 · 2 comments · Fixed by #5648
Closed

[Flaky Test]: TestLogIngestionFleetManaged – Condition never satisfied #5631

blakerouse opened this issue Sep 30, 2024 · 2 comments · Fixed by #5648
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team

Comments

@blakerouse
Copy link
Contributor

Failing test case

TestLogIngestionFleetManaged

Error message

Condition never satisfied

Build

https://buildkite.com/elastic/elastic-agent-extended-testing/builds/3139

OS

Linux

Stacktrace and notes

=== RUN   TestLogIngestionFleetManaged/Monitoring_logs_are_shipped
    logs_ingestion_test.go:138: Making sure metricbeat logs are populated
    logs_ingestion_test.go:268: 
        	Error Trace:	C:/Users/windows/agent/testing/integration/logs_ingestion_test.go:268
        	            				C:/Users/windows/agent/testing/integration/logs_ingestion_test.go:139
        	            				C:/Users/windows/agent/testing/integration/logs_ingestion_test.go:104
        	Error:      	Condition never satisfied
        	Test:       	TestLogIngestionFleetManaged/Monitoring_logs_are_shipped
--- FAIL: TestLogIngestionFleetManaged/Monitoring_logs_are_shipped (180.02s)
@blakerouse blakerouse added flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team labels Sep 30, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz
Copy link
Member

cmacknz commented Oct 1, 2024

            input-beat/metrics-monitoring-metrics-monitoring-beats:
                message: 'Error fetching data for metricset beat.stats: error making http request: Get "http://npipe/stats": open \\.\pipe\PGwsYWcynGUYZEjD872Gs-npqbv-30jS.sock: The system cannot find the file specified.'
                payload:
                    streams:
                        metrics-monitoring-filebeat:
                            error: 'Error fetching data for metricset beat.stats: error making http request: Get "http://npipe/stats": open \\.\pipe\PGwsYWcynGUYZEjD872Gs-npqbv-30jS.sock: The system cannot find the file specified.'
                            status: DEGRADED
                        metrics-monitoring-metricbeat:
                            error: ""
                            status: HEALTHY
{"log.level":"info","@timestamp":"2024-09-30T17:29:24.859Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (HEALTHY->CONFIGURING): Configuring","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"CONFIGURING","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-30T17:29:24.859Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":625},"message":"Spawned new component log-default: Starting: spawned pid '3868'","log":{"source":"elastic-agent"},"component":{"id":"log-default","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-30T17:29:24.859Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":632},"message":"Spawned new unit log-default-logfile-logs-4fea2301-0dd4-4bc5-a1ca-b49781d79ed3: Starting: spawned pid '3868'","log":{"source":"elastic-agent"},"component":{"id":"log-default","state":"STARTING"},"unit":{"id":"log-default-logfile-logs-4fea2301-0dd4-4bc5-a1ca-b49781d79ed3","type":"input","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-30T17:29:24.859Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":632},"message":"Spawned new unit log-default: Starting: spawned pid '3868'","log":{"source":"elastic-agent"},"component":{"id":"log-default","state":"STARTING"},"unit":{"id":"log-default","type":"output","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-30T17:29:24.859Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (CONFIGURING->HEALTHY): Healthy","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"HEALTHY","old_state":"CONFIGURING"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-09-30T17:29:24.859Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (HEALTHY->DEGRADED): Error fetching data for metricset beat.stats: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\PGwsYWcynGUYZEjD872Gs-npqbv-30jS.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-09-30T17:29:24.859Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (CONFIGURING->DEGRADED): Error fetching data for metricset http.json: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\PGwsYWcynGUYZEjD872Gs-npqbv-30jS.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"DEGRADED","old_state":"CONFIGURING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-30T17:29:24.860Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (DEGRADED->HEALTHY): Healthy","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"HEALTHY","old_state":"DEGRADED"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-09-30T17:29:24.861Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (HEALTHY->DEGRADED): Error fetching data for metricset http.json: error making http request: Get \"http://npipe/inputs\": open \\\\.\\pipe\\PGwsYWcynGUYZEjD872Gs-npqbv-30jS.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}

The test ran for 3 minutes or 180s which is the timeout in this block:

require.Eventually(
t,
func() bool {
var err error
docs, err = findFn()
if err != nil {
t.Logf("got an error querying ES, retrying. Error: %s", err)
}
return err == nil
},
3*time.Minute,
15*time.Second,
)

The DEGRADED status is reported at 2024-09-30T17:29:24.859Z and the agent logs end at 2024-09-30T17:29:40.255Z which is an elapsed time of 40-29=11s. The metrics collection interval is 60s so if metricbeat isn't ready right away for some reason we get this.

This is probably a timing issue, we spend most of the polling timeout just waiting for things to startup for 3 minutes, and then the metricbeat named pipe isn't ready by the first metrics polling interval. 2024-09-30T17:25:54 is the start of the agent logs and the input becomes degraded at 2024-09-30T17:29:24.859Z so we spent almost all of the 3 minute timeout just waiting to start up.

Faster windows instances would help, but I bet increasing the metrics collection interval for testing would help a lot. It's at 60s to keep data volume down but we can speed it up in tests.

There is a metrics_period we can use to change this:

// MonitoringConfig describes a configuration of a monitoring
type MonitoringConfig struct {
Enabled bool `yaml:"enabled" config:"enabled"`
MonitorLogs bool `yaml:"logs" config:"logs"`
MonitorMetrics bool `yaml:"metrics" config:"metrics"`
MetricsPeriod string `yaml:"metrics_period" config:"metrics_period"`

We can set it much faster, something like 1s would be fine here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants