-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky Test] Fix Flaky Test SearchTimeoutIT.testSimpleTimeout #16828
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16828 +/- ##
============================================
- Coverage 72.21% 72.05% -0.16%
+ Complexity 65335 65231 -104
============================================
Files 5318 5318
Lines 304081 304081
Branches 43995 43995
============================================
- Hits 219578 219103 -475
- Misses 66541 66984 +443
- Partials 17962 17994 +32 ☔ View full report in Codecov by Sentry. |
❌ Gradle check result for d1fc8be: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
SpecificClusterManagerNodesIT.testElectOnlyBetweenClusterManagerNodes #15944 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kkewwei this was a great find!
Thanks @kkewwei , I am actually a bit surprised by your findings
The query phase has to be terminated early by timeout, right? So it should be not much longer then timeout itself? |
❕ Gradle check result for d1fc8be: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
@reta Yes, the query phase has to be terminated early by timeout, but it may be much longer than OpenSearch/server/src/main/java/org/opensearch/search/internal/CancellableBulkScorer.java Line 68 in 57a6605
In additional, if we should decrease the upper interval(100m) |
Thanks @kkewwei , I think this is the problem, not the test, right? If timeout does not early terminate the query within reasonable time margin, it is not very useful. |
@reta To avoid excessive timeouts, maybe we should decrease the upper interval(100m), such as 100k? |
@kkewwei thanks for staying with me, I only briefly looked at overall implementation and it looks like we may lost some Opened up #16882 |
@reta Of course, please free free to go ahead. |
server/src/internalClusterTest/java/org/opensearch/search/SearchTimeoutIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: kkewwei <kkewwei@163.com>
❕ Gradle check result for 3a34c9d: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Signed-off-by: kkewwei <kkewwei@163.com> (cherry picked from commit 7050ecf) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 7050ecf) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…16828) Signed-off-by: kkewwei <kkewwei@163.com>
…16828) Signed-off-by: kkewwei <kkewwei@163.com>
Description
When numDocs=1000:
testSimpleTimeout
will cost several minutes, when scoring each doc, it will cost 500ms, it's a long time to iterating all the doc inqueryphase
.OpenSearch/server/src/internalClusterTest/java/org/opensearch/search/SearchTimeoutIT.java
Line 138 in 5aa6509
ReaderContext
is created before executingqueryphase
and released after thefetchphase
.ReaderContext
is 1min(determined bysearch.keep_alive_interval
)queryPhase
costs too much time, theReaderContext
may be released beforefetchphase
, so thefetch/Id
will be failed, which hit the case.Related Issues
Resolves #16056 #9401
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.