Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] MLModelDeploymentsUpgradeIT testTrainedModelDeployment failing #87959

Closed
andreidan opened this issue Jun 23, 2022 · 10 comments · Fixed by #88289
Closed

[CI] MLModelDeploymentsUpgradeIT testTrainedModelDeployment failing #87959

andreidan opened this issue Jun 23, 2022 · 10 comments · Fixed by #88289
Assignees
Labels
:ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@andreidan
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/rrbldaxcklpvk/tests/:x-pack:qa:rolling-upgrade:v8.2.4%23twoThirdsUpgradedTest/org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT/testTrainedModelDeployment

Reproduction line:
./gradlew ':x-pack:qa:rolling-upgrade:v8.2.4#twoThirdsUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT" -Dtests.method="testTrainedModelDeployment" -Dtests.seed=B48E616E1A60493F -Dtests.bwc=true -Dtests.locale=no-NO -Dtests.timezone=America/Kralendijk -Druntime.java=17

Applicable branches:
master

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT&tests.test=testTrainedModelDeployment

Failure excerpt:

org.elasticsearch.client.ResponseException: method [POST], host [http://[::1]:35509], URI [/_ml/trained_models/upgrade-deployment-test/deployment/_infer], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"status_exception","reason":"[upgrade-deployment-test] unable to find deployment task for inference please stop and start the deployment or try again momentarily"}],"type":"status_exception","reason":"[upgrade-deployment-test] unable to find deployment task for inference please stop and start the deployment or try again momentarily"},"status":404}

  at __randomizedtesting.SeedInfo.seed([B48E616E1A60493F:4F5627313BA0F3A8]:0)
  at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:347)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:313)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
  at org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT.infer(MLModelDeploymentsUpgradeIT.java:249)
  at org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT.assertInfer(MLModelDeploymentsUpgradeIT.java:149)
  at org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT.testTrainedModelDeployment(MLModelDeploymentsUpgradeIT.java:82)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:833)

@andreidan andreidan added :ml Machine learning >test-failure Triaged test failures from CI labels Jun 23, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jun 23, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@pgomulka
Copy link
Contributor

@dimitris-athanasiou
Copy link
Contributor

Unfortunately this ^^ failure did not have the enabled logging yet so we've got to wait for another failure.

@dakrone
Copy link
Member

dakrone commented Jun 24, 2022

I had another failure today: https://gradle-enterprise.elastic.co/s/rhoh2merwpz5o

@benwtrent
Copy link
Member

Here are rolling upgrade logs from a recent failure: https://gradle-enterprise.elastic.co/s/iubebkebq7vsi

rolling-upgrade.zip

It has debug and trace lines for inference/action, etc. in 8.

Note that the test is an upgrade from 8.2.4 -> 8.4.0 (there are multiple bwc tests ran in this suite)

I am muting the tests.

@benwtrent
Copy link
Member

@dimitris-athanasiou ^

benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Jun 27, 2022
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Jun 28, 2022
Unmuting to collect logs after adding more logging to
version 8.2.

Relates elastic#87959
dimitris-athanasiou added a commit that referenced this issue Jun 28, 2022
Unmuting to collect logs after adding more logging to
version 8.2.

Relates #87959
@fcofdez
Copy link
Contributor

fcofdez commented Jun 30, 2022

Another instance of this failure (https://gradle-enterprise.elastic.co/s/b3ndm5tccf23w)

@valeriy42
Copy link
Contributor

Another failure here https://gradle-enterprise.elastic.co/s/lfba2yqci5eww

It failed 42 times in the past 7 days.

benwtrent added a commit that referenced this issue Jul 6, 2022
When the internal objects were renamed, an inappropriate bwc version restriction was put in place.

This commit fixes this by allowing trained model assignment metadata updates to be serialized to nodes > 8.0.0.

This is OK as the object serialization handles its BWC conditions when serializing over the wire.

This closes: #87959
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Jul 6, 2022
When the internal objects were renamed, an inappropriate bwc version restriction was put in place.

This commit fixes this by allowing trained model assignment metadata updates to be serialized to nodes > 8.0.0.

This is OK as the object serialization handles its BWC conditions when serializing over the wire.

This closes: elastic#87959
benwtrent added a commit that referenced this issue Jul 8, 2022
When the internal objects were renamed, an inappropriate bwc version restriction was put in place.

This commit fixes this by allowing trained model assignment metadata updates to be serialized to nodes > 8.0.0.

This is OK as the object serialization handles its BWC conditions when serializing over the wire.

This closes: #87959
@pgomulka
Copy link
Contributor

The test has failed in the last 3 days. Possibly a backport is needed? https://gradle-enterprise.elastic.co/s/xiemsmv6cxbz6

@pgomulka pgomulka reopened this Jul 21, 2022
@droberts195
Copy link
Contributor

Possibly a backport is needed?

The fix is now present on all active branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants