-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite #28566
Conversation
cc @cloud-fan. The flakiness was found at #28463. Can you take a quick look? |
I am going to merge this, see #28463 (comment). |
Merged to master and branch-3.0. |
…lacementStrategySuite ### What changes were proposed in this pull request? This PR proposes to show the actual traceback when "handle large number of containers and tasks (SPARK-18750)" test fails in `LocalityPlacementStrategySuite`. **It does not fully resolve the JIRA SPARK-31746 yet**. I tried to reproduce in my local by controlling the factors in the tests but I couldn't. I double checked the changes in SPARK-18750 are still valid. ### Why are the changes needed? This test is flaky for an unknown reason (see https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122768/testReport/org.apache.spark.deploy.yarn/LocalityPlacementStrategySuite/handle_large_number_of_containers_and_tasks__SPARK_18750_/): ``` sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: java.lang.StackOverflowError did not equal null at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) ``` After this PR, it will help to investigate the root cause: **Before**: ``` [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (824 milliseconds) [info] java.lang.StackOverflowError did not equal null (LocalityPlacementStrategySuite.scala:49) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.$anonfun$new$1(LocalityPlacementStrategySuite.scala:49) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:157) [info] at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) [info] at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) [info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) [info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) ... ``` **After**: ``` [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (825 milliseconds) [info] StackOverflowError should not be thrown; however, got: [info] [info] java.lang.StackOverflowError [info] at scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:256) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) ... ``` ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested by reverting 76db394 locally. Closes #28566 from HyukjinKwon/SPARK-31746. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 3bf7bf9) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon It seems branch-3.0 broken?
|
@@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite { | |||
// goal is to create enough requests for localized containers (so there should be many | |||
// tasks on several hosts that have no allocated containers). | |||
|
|||
val resource = Resource.newInstance(8 * 1024, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah. Seems it's used in the branch-3.0. I will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed now. thanks for pointing out quickly.
…lacementStrategySuite This PR proposes to show the actual traceback when "handle large number of containers and tasks (SPARK-18750)" test fails in `LocalityPlacementStrategySuite`. **It does not fully resolve the JIRA SPARK-31746 yet**. I tried to reproduce in my local by controlling the factors in the tests but I couldn't. I double checked the changes in SPARK-18750 are still valid. This test is flaky for an unknown reason (see https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122768/testReport/org.apache.spark.deploy.yarn/LocalityPlacementStrategySuite/handle_large_number_of_containers_and_tasks__SPARK_18750_/): ``` sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: java.lang.StackOverflowError did not equal null at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) ``` After this PR, it will help to investigate the root cause: **Before**: ``` [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (824 milliseconds) [info] java.lang.StackOverflowError did not equal null (LocalityPlacementStrategySuite.scala:49) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.$anonfun$new$1(LocalityPlacementStrategySuite.scala:49) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:157) [info] at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) [info] at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) [info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) [info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) ... ``` **After**: ``` [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (825 milliseconds) [info] StackOverflowError should not be thrown; however, got: [info] [info] java.lang.StackOverflowError [info] at scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:256) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) [info] at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) [info] at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256) ... ``` No, dev-only. Manually tested by reverting 76db394 locally. Closes #28566 from HyukjinKwon/SPARK-31746. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 3bf7bf9) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Test build #122782 has finished for PR 28566 at commit
|
What changes were proposed in this pull request?
This PR proposes to show the actual traceback when "handle large number of containers and tasks (SPARK-18750)" test fails in
LocalityPlacementStrategySuite
.It does not fully resolve the JIRA SPARK-31746 yet. I tried to reproduce in my local by controlling the factors in the tests but I couldn't. I double checked the changes in SPARK-18750 are still valid.
Why are the changes needed?
This test is flaky for an unknown reason (see https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122768/testReport/org.apache.spark.deploy.yarn/LocalityPlacementStrategySuite/handle_large_number_of_containers_and_tasks__SPARK_18750_/):
After this PR, it will help to investigate the root cause:
Before:
After:
Does this PR introduce any user-facing change?
No, dev-only.
How was this patch tested?
Manually tested by reverting 76db394 locally.