Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test TestReadRotatingFiles/MoveCreateTimestamped #1382

Closed
jpkrohling opened this issue Oct 27, 2020 · 5 comments · Fixed by #1535
Closed

Flaky test TestReadRotatingFiles/MoveCreateTimestamped #1382

jpkrohling opened this issue Oct 27, 2020 · 5 comments · Fixed by #1535
Assignees
Labels
bug Something isn't working flaky test a test is flaky priority:p2 Medium spec:logs

Comments

@jpkrohling
Copy link
Member

Occurred for #1348 here:

https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector-contrib/5514/workflows/72a3a765-14bf-447a-a710-31de95350478/jobs/41996

--- FAIL: TestReadRotatingFiles (0.00s)
    --- FAIL: TestReadRotatingFiles/MoveCreateTimestamped (2.59s)
        e2e_test.go:220: Temp Dir: C:\Users\circleci\AppData\Local\Temp\003111337
        logger.go:130: 2020-10-27T09:23:38.081Z	INFO	Starting stanza receiver
        logger.go:130: 2020-10-27T09:23:38.081Z	DEBUG	Starting operator
        logger.go:130: 2020-10-27T09:23:38.081Z	DEBUG	Started operator
        logger.go:130: 2020-10-27T09:23:38.081Z	DEBUG	Starting operator	{"operator_id": "$.regex_parser", "operator_type": "regex_parser"}
        logger.go:130: 2020-10-27T09:23:38.081Z	DEBUG	Started operator	{"operator_id": "$.regex_parser", "operator_type": "regex_parser"}
        logger.go:130: 2020-10-27T09:23:38.081Z	DEBUG	Starting operator	{"operator_id": "$.file_input", "operator_type": "file_input"}
        logger.go:130: 2020-10-27T09:23:38.081Z	DEBUG	Started operator	{"operator_id": "$.file_input", "operator_type": "file_input"}
        e2e_test.go:196: 
            	Error Trace:	e2e_test.go:196
            	Error:      	Condition never satisfied
            	Test:       	TestReadRotatingFiles/MoveCreateTimestamped
FAIL
FAIL	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/stanzareceiver	2.836s
FAIL

cc @djaglowski

@tigrannajaryan
Copy link
Member

@djaglowski also failing TestReadRotatingFiles/CopyTruncateSequential and TestReadRotatingFiles/CopyTruncateTimestamped, see https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector-contrib/5531/workflows/0f13b749-d13c-4d69-9e25-1716c98cf061/jobs/42181

--- FAIL: TestReadRotatingFiles (0.00s)
    --- FAIL: TestReadRotatingFiles/CopyTruncateSequential (2.58s)
        e2e_test.go:220: Temp Dir: C:\Users\circleci\AppData\Local\Temp\348215778
        logger.go:130: 2020-10-27T15:19:06.520Z	INFO	Starting stanza receiver
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Starting operator
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Started operator
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Starting operator	{"operator_id": "$.regex_parser", "operator_type": "regex_parser"}
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Started operator	{"operator_id": "$.regex_parser", "operator_type": "regex_parser"}
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Starting operator	{"operator_id": "$.file_input", "operator_type": "file_input"}
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Started operator	{"operator_id": "$.file_input", "operator_type": "file_input"}
        e2e_test.go:196: 
            	Error Trace:	e2e_test.go:196
            	Error:      	Condition never satisfied
            	Test:       	TestReadRotatingFiles/CopyTruncateSequential
    --- FAIL: TestReadRotatingFiles/CopyTruncateTimestamped (2.58s)
        e2e_test.go:220: Temp Dir: C:\Users\circleci\AppData\Local\Temp\001070799
        logger.go:130: 2020-10-27T15:19:06.520Z	INFO	Starting stanza receiver
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Starting operator
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Started operator
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Starting operator	{"operator_id": "$.regex_parser", "operator_type": "regex_parser"}
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Started operator	{"operator_id": "$.regex_parser", "operator_type": "regex_parser"}
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Starting operator	{"operator_id": "$.file_input", "operator_type": "file_input"}
        logger.go:130: 2020-10-27T15:19:06.521Z	DEBUG	Started operator	{"operator_id": "$.file_input", "operator_type": "file_input"}
        e2e_test.go:196: 
            	Error Trace:	e2e_test.go:196
            	Error:      	Condition never satisfied
            	Test:       	TestReadRotatingFiles/CopyTruncateTimestamped
FAIL
FAIL	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/stanzareceiver	2.830s

@djaglowski
Copy link
Member

#1384 should address this, but it is waiting on #1380

@tigrannajaryan
Copy link
Member

@djaglowski It failed again: https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector-contrib/5663/workflows/da080a18-0c5e-43de-9c71-e78b9563f0c8/jobs/43566

Can you have a look? We need a robust solution that is not timing or machine performance dependent.

@djaglowski
Copy link
Member

@tigrannajaryan I'll have another PR shortly.

This test is something of a performance test, so I think that perhaps it does not really belong here.

However, there are correctness aspects that I think can be preserved by substantially slowing down file rotation and reducing the number of lines written. This should still demonstrate the capability of handling file rotation correctly while also being orders of magnitude less sensitive to blips in underlying file system performance. Ultimately, I don't see a way to make this perfectly non-deterministic, but I think these changes will put it in line with any other test that has a fundamental dependendency on disk IO within a limited timeframe.

The performance aspect of this can be duplicated later in the testbed, where longer file lifespans along with perhaps a trivial but non-zero margin of error would allow for acceptable stability.

@djaglowski
Copy link
Member

@tigrannajaryan My understanding of the root cause was incorrect. I will have a closer look at this tomorrow.

@mx-psi mx-psi added the flaky test a test is flaky label Nov 5, 2021
ljmsc referenced this issue in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
* migrating CircleCI jobs to GitHub Actions

* using container matrix instead

* prevent entire workflow from stopping if one go version job fails

* updating github ci to use setup-go

* updating changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky test a test is flaky priority:p2 Medium spec:logs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants