Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TimeoutCheckerTests.testWatchdog failing regularly #48861

Closed
droberts195 opened this issue Nov 5, 2019 · 2 comments · Fixed by #62391
Closed

[CI] TimeoutCheckerTests.testWatchdog failing regularly #48861

droberts195 opened this issue Nov 5, 2019 · 2 comments · Fixed by #62391
Assignees
Labels
:ml Machine learning >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

TimeoutCheckerTests.testWatchdog has been failing regularly on master and 7.x for the last few days.

master example: https://gradle-enterprise.elastic.co/s/vcaxvky7vonu6
7.x example: https://gradle-enterprise.elastic.co/s/d5bttww4yctky

The error is:

04:39:58 org.elasticsearch.xpack.ml.filestructurefinder.TimeoutCheckerTests > testWatchdog FAILED
04:39:58     Wanted but not invoked:
04:39:58     matcher.interrupt();
04:39:58     -> at org.elasticsearch.xpack.ml.filestructurefinder.TimeoutCheckerTests.lambda$testWatchdog$2(TimeoutCheckerTests.java:72)
04:39:58     Actually, there were zero interactions with this mock.
04:39:58         at __randomizedtesting.SeedInfo.seed([4BA94B8F714382E0:E9C403A9D4A4ABA6]:0)
04:39:58         at org.elasticsearch.xpack.ml.filestructurefinder.TimeoutCheckerTests.lambda$testWatchdog$2(TimeoutCheckerTests.java:72)
04:39:58         at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:867)
04:39:58         at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:840)
04:39:58         at org.elasticsearch.xpack.ml.filestructurefinder.TimeoutCheckerTests.testWatchdog(TimeoutCheckerTests.java:71)

This didn't reproduce locally on master using:

./gradlew ':x-pack:plugin:ml:test' --tests "org.elasticsearch.xpack.ml.filestructurefinder.TimeoutCheckerTests.testWatchdog" \
  -Dtests.seed=4BA94B8F714382E0 \
  -Dtests.security.manager=true \
  -Dtests.locale=es-DO \
  -Dtests.timezone=ECT \
  -Dcompiler.java=12 \
  -Druntime.java=11

It is probably a side effect of #48346.

@droberts195 droberts195 added >test-failure Triaged test failures from CI :ml Machine learning labels Nov 5, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

droberts195 added a commit that referenced this issue Nov 5, 2019
droberts195 added a commit that referenced this issue Nov 5, 2019
@droberts195
Copy link
Contributor Author

I muted the test:

master: 4a66cfc
7.x: c03f7ba

@benwtrent benwtrent self-assigned this Sep 15, 2020
benwtrent added a commit that referenced this issue Sep 16, 2020
…out (#62391)

Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition.

The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately.

closes #48861
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Sep 16, 2020
…out (elastic#62391)

Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition.

The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately.

closes elastic#48861
benwtrent added a commit that referenced this issue Sep 16, 2020
…out (#62391) (#62447)

Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition.

The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately.

closes #48861
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants