Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky retry tests #10887

Merged
merged 7 commits into from
Feb 5, 2024
Merged

Fix flaky retry tests #10887

merged 7 commits into from
Feb 5, 2024

Conversation

larry-safran
Copy link
Contributor

@larry-safran larry-safran commented Feb 1, 2024

Fix retry tests that became flaky. Fixes #10873

@larry-safran larry-safran marked this pull request as ready for review February 2, 2024 21:38
@larry-safran larry-safran changed the title improve debugging Fix flaky retry tests Feb 2, 2024
@@ -103,8 +105,11 @@ public class RetryTest {
@Rule
public final GrpcCleanupRule cleanupRule = new GrpcCleanupRule();
private final FakeClock fakeClock = new FakeClock();
@Mock
private ClientCall.Listener<Integer> mockCallListener;
private TestListener realMockCallListener = new TestListener();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use the word "mock" since it's not a mock object. Just testCallListener would do I think, or people ofthen use the term "fake" with these kinds of classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to testCallListener

Comment on lines 177 to 179
@Before
public void setUp() throws Exception {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

if (getTransportTracer() != null) {
getTransportTracer().reportStreamClosed(status.isOk());
}
listener().closed(status, rpcProgress, trailers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You switched the order of the the listener and transportTracer.
The original tests is about verifying census stats(in statsTraceCtx), how does that related to transportTracer?

@larry-safran
Copy link
Contributor Author

larry-safran commented Feb 2, 2024 via email

@YifeiZhuang
Copy link
Contributor

when the closed method is called before the tracer report is generated there is a window where the report isn't available, which is what caused the failures.

hmm, I had a theory about why the call closed method is called before those reports I put on the issue.

@larry-safran
Copy link
Contributor Author

larry-safran commented Feb 2, 2024 via email

Copy link
Contributor

@temawi temawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough understanding of the original problem to know if @YifeiZhuang's concern here is warranted, but if she is ok with the change it LGTM.

@YifeiZhuang
Copy link
Contributor

I agree that the test setup caused the flakiness, not a real problem. One thing to keep in mind: when canceling the retryable stream returns and close and master listener, all the substreams should have been closed already. Because notifying master listener is the last thing we should do in retryable stream. Otherwise there will be resource leak, e.g. RejectedExecutionException.
But I don't see the fix address that problem. In the current fix, the master listener may still happen before the substream listener is closed. It might mitigate the flakiness, but I'm not yet convinced this is the appropriate fix.
cc. @ejona86

@larry-safran larry-safran merged commit 374dbe9 into grpc:master Feb 5, 2024
14 checks passed
@larry-safran larry-safran deleted the is_10873 branch February 5, 2024 18:55
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit that referenced this pull request Feb 5, 2024
* Fix retries that timeout hanging forever. (#10855)

Fixes #10336

* Fix flaky retry tests (#10887)

* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit that referenced this pull request Feb 6, 2024
* Fix retries that timeout hanging forever. (#10855)

Fixes #10336

* Fix flaky retry tests (#10887)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

interop-test, core: retryTest is flaky
3 participants