Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal/otelarrow] Flaky test disabled: TestIntegrationMemoryLimited #34719

Closed
pjanotti opened this issue Aug 16, 2024 · 10 comments · Fixed by #34889 or #36034
Closed

[internal/otelarrow] Flaky test disabled: TestIntegrationMemoryLimited #34719

pjanotti opened this issue Aug 16, 2024 · 10 comments · Fixed by #34889 or #36034

Comments

@pjanotti
Copy link
Contributor

Component(s)

internal/otelarrow

Describe the issue you're reporting

Hit on #34358 see https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/10203882356/job/28231140032?pr=34358#step:6:518

=== FAIL: test TestIntegrationSelfTracing (11.03s)
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:272: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:272
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:418
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:220
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:476
        	Error:      	Not equal: 
        	            	expected: 10000
        	            	actual  : 4664
        	Test:       	TestIntegrationSelfTracing
@pjanotti pjanotti added the needs triage New item requiring triage label Aug 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@pjanotti
Copy link
Contributor Author

Since both hits are on Windows /label os:windows

@crobert-1
Copy link
Member

Since both hits are on Windows /label os:windows

FYI: To add a label using automation, the /label message has to be at the beginning of the comment. Source

@jmacd
Copy link
Contributor

jmacd commented Aug 21, 2024

Will take a look.

@jmacd
Copy link
Contributor

jmacd commented Aug 21, 2024

I would like to recommend #34794, and if that fails I'll be glad to disable the test on Windows. Without trying to fix this, I'm not sure how we'd ever resolve it.

@pjanotti
Copy link
Contributor Author

Many test failures in Windows are due to the scheduling and the default time tick resolution being different than *nix. The sleep added in #34794 seems a reasonable try.

@jmacd
Copy link
Contributor

jmacd commented Aug 22, 2024

I have added one Skip to this test, will leave this issue open.

@jmacd jmacd changed the title [internal/otelarrow] Flaky test: TestIntegrationSelfTracing [internal/otelarrow] Flaky test disabled: TestIntegrationMemoryLimited Aug 22, 2024
jpkrohling added a commit that referenced this issue Aug 27, 2024
…34794)

**Description:** Fixes the causes of flakiness in most cases by using a
callback to terminate the test without resorting to sleep statements.
There is still one flaky test that for reasons not understood, does not
pass. Fortunately, it fails in a repeatable way, and I will debug as
part of #34719.

**Link to tracking Issue:**
#34719

---------

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>
codeboten added a commit that referenced this issue Sep 6, 2024
**Description:** Restore a skipped test, after understanding the nature
of the problem.

The problem was mostly addressed in
#34794,
which left the test disabled. The test had been flaky because while
testing for an out-of-memory condition, the test could fail for timeout
or other reason. To make the test more reliable, this now waits until at
least one ArrowTraces span has been received by both components. After
one span is available, it checks that the expected log messages are
present on both sides.

**Link to tracking Issue:** 
Fixes #34719.

**Testing:** ✅

---------

Co-authored-by: Curtis Robert <crobert@splunk.com>
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…pen-telemetry#34794)

**Description:** Fixes the causes of flakiness in most cases by using a
callback to terminate the test without resorting to sleep statements.
There is still one flaky test that for reasons not understood, does not
pass. Fortunately, it fails in a repeatable way, and I will debug as
part of open-telemetry#34719.

**Link to tracking Issue:**
open-telemetry#34719

---------

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
**Description:** Restore a skipped test, after understanding the nature
of the problem.

The problem was mostly addressed in
open-telemetry#34794,
which left the test disabled. The test had been flaky because while
testing for an out-of-memory condition, the test could fail for timeout
or other reason. To make the test more reliable, this now waits until at
least one ArrowTraces span has been received by both components. After
one span is available, it checks that the expected log messages are
present on both sides.

**Link to tracking Issue:** 
Fixes open-telemetry#34719.

**Testing:** ✅

---------

Co-authored-by: Curtis Robert <crobert@splunk.com>
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
**Description:** Restore a skipped test, after understanding the nature
of the problem.

The problem was mostly addressed in
open-telemetry#34794,
which left the test disabled. The test had been flaky because while
testing for an out-of-memory condition, the test could fail for timeout
or other reason. To make the test more reliable, this now waits until at
least one ArrowTraces span has been received by both components. After
one span is available, it checks that the expected log messages are
present on both sides.

**Link to tracking Issue:** 
Fixes open-telemetry#34719.

**Testing:** ✅

---------

Co-authored-by: Curtis Robert <crobert@splunk.com>
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
@songy23
Copy link
Member

songy23 commented Oct 16, 2024

This is still happening in Windows CIs though with different messages https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/11370646308/job/31630914777

=== Failed
=== FAIL: test TestIntegrationMemoryLimited (37.46s)
make[2]: *** [../../Makefile.Common:131: test] Error 1
    e2e_test.go:100: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:100
make[1]: *** [Makefile:200: internal/otelarrow] Error 2
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/otelarrowreceiver/internal/arrow/arrow.go:892
make[1]: *** Waiting for unfinished jobs....
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/otelarrowreceiver/internal/arrow/arrow.go:702
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/otelarrowreceiver/internal/arrow/arrow.go:682
        	            				C:/hostedtoolcache/windows/go/1.22.8/x64/src/runtime/asm_amd64.s:1695
        	Error:      	"4.166666666s" is not less than "4.1333912s"
        	Test:       	TestIntegrationMemoryLimited

=== FAIL: test TestIntegrationMemoryLimited (re-run 1) (44.79s)
    e2e_test.go:100: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:100
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/otelarrowreceiver/internal/arrow/arrow.go:892
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/otelarrowreceiver/internal/arrow/arrow.go:702
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/otelarrowreceiver/internal/arrow/arrow.go:682
        	            				C:/hostedtoolcache/windows/go/1.22.8/x64/src/runtime/asm_amd64.s:1695
        	Error:      	"4.166666666s" is not less than "3.9117612s"
        	Test:       	TestIntegrationMemoryLimited

DONE 2 runs, 50 tests, 2 failures in 178.112s
✓  . (1.129s)

@songy23 songy23 reopened this Oct 16, 2024
mx-psi pushed a commit that referenced this issue Oct 29, 2024
#### Description

Remove a flaky portion of the internal/otelarrow/test e2e test.
#### Link to tracking issue
Fixes #34719.

#### Testing

There was a time-based test that has proven unreliable.

#### Documentation

n/a
jpbarto pushed a commit to jpbarto/opentelemetry-collector-contrib that referenced this issue Oct 29, 2024
…try#36034)

#### Description

Remove a flaky portion of the internal/otelarrow/test e2e test.
#### Link to tracking issue
Fixes open-telemetry#34719.

#### Testing

There was a time-based test that has proven unreliable.

#### Documentation

n/a
zzhlogin pushed a commit to zzhlogin/opentelemetry-collector-contrib-aws that referenced this issue Nov 12, 2024
…try#36034)

#### Description

Remove a flaky portion of the internal/otelarrow/test e2e test.
#### Link to tracking issue
Fixes open-telemetry#34719.

#### Testing

There was a time-based test that has proven unreliable.

#### Documentation

n/a
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
…try#36034)

#### Description

Remove a flaky portion of the internal/otelarrow/test e2e test.
#### Link to tracking issue
Fixes open-telemetry#34719.

#### Testing

There was a time-based test that has proven unreliable.

#### Documentation

n/a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants