-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BasicDistributedJobsIT#testDedicatedMlNode fails #63980
Comments
Pinging @elastic/ml-core (:ml) |
I think the interesting bit of the log is:
The test ends when the job's persistent task is assigned to a node. However, after this happens (and hence after the end of the test code and after the cleanup code has started running) the autodetect process manager will be running the autodetect process (the "blackhole" one rather than the native one in this case since it's an internal cluster test). So it seems that there's a window of opportunity where if a job is closed in between being assigned to a node but before the autodetect process is running then that close request is ignored. (This may possibly only affect close requests that use |
another failure on master (PR) https://gradle-enterprise.elastic.co/s/obdm3spo7telc does not reproduce locally |
This has been failing more recently on As this has started failing more in the past couple days, I'm going to mute this test. |
See elastic#63980 for details
See elastic#63980 for details
See elastic#63980 for details
…c#69198) previous work fixing a hard to detect `_close` race condition : elastic#69136 May have fixed the test failures indicated in: elastic#63980 adding some logging and unmuting the previously flaky test to see if the issue has been addressed. closes: elastic#63980
There has been trace logging in BasicDistributedJobsIT for elastic#63980 since February. This was added in elastic#69198, on the basis that elastic#69136 had probably fixed a problem but if not then the trace logging would be useful. Since the tests have not been failing since they were reenabled this logging can now be removed.
There has been trace logging in BasicDistributedJobsIT for #63980 since February. This was added in #69198, on the basis that #69136 had probably fixed a problem but if not then the trace logging would be useful. Since the tests have not been failing since they were reenabled this logging can now be removed.
There has been trace logging in BasicDistributedJobsIT for elastic#63980 since February. This was added in elastic#69198, on the basis that elastic#69136 had probably fixed a problem but if not then the trace logging would be useful. Since the tests have not been failing since they were reenabled this logging can now be removed.
There has been trace logging in BasicDistributedJobsIT for elastic#63980 since February. This was added in elastic#69198, on the basis that elastic#69136 had probably fixed a problem but if not then the trace logging would be useful. Since the tests have not been failing since they were reenabled this logging can now be removed.
There has been trace logging in BasicDistributedJobsIT for #63980 since February. This was added in #69198, on the basis that #69136 had probably fixed a problem but if not then the trace logging would be useful. Since the tests have not been failing since they were reenabled this logging can now be removed.
There has been trace logging in BasicDistributedJobsIT for #63980 since February. This was added in #69198, on the basis that #69136 had probably fixed a problem but if not then the trace logging would be useful. Since the tests have not been failing since they were reenabled this logging can now be removed.
Build scan: https://gradle-enterprise.elastic.co/s/jji6ljcth352u/console-log?task=:x-pack:plugin:ml:internalClusterTest
Repro line: ./gradlew ':x-pack:plugin:ml:internalClusterTest' --tests "org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT.testDedicatedMlNode" -Dtests.seed=B52557EBFF0CB0F6 -Dtests.security.manager=true -Dtests.locale=ar-SA -Dtests.timezone=Africa/Cairo -Druntime.java=8
Reproduces locally?: no
Applicable branches: failed on 7.x and 7.10 so far
Failure history: It started failing only a few days ago in this way and failed only three times in total with this specific error.
Failure excerpt:
I found in the logs errors like, not sure if related:
The text was updated successfully, but these errors were encountered: