Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Remove the newly added YarnShuffleServiceSuite.java #30349

Closed
wants to merge 1 commit into from

Conversation

otterc
Copy link
Contributor

@otterc otterc commented Nov 12, 2020

What changes were proposed in this pull request?

This is a follow-up fix for the failing tests in YarnShuffleServiceSuite.java. This java class was introduced in #30062. The tests in the class fail when run with hadoop-2.7 profile:

[ERROR] testCreateDefaultMergedShuffleFileManagerInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0.627 s  <<< ERROR!
java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)

[ERROR] testCreateRemoteBlockPushResolverInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0 s  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.network.yarn.YarnShuffleService
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateRemoteBlockPushResolverInstance(YarnShuffleServiceSuite.java:47)

[ERROR] testInvalidClassNameOfMergeManagerWillUseNoOpInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0.001 s  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.network.yarn.YarnShuffleService
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testInvalidClassNameOfMergeManagerWillUseNoOpInstance(YarnShuffleServiceSuite.java:57)

A test suit for YarnShuffleService did exist here:
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
I missed this when I created common/network-yarn/src/test/java/org/apache/spark/network/yarn/YarnShuffleServiceSuite.java. Moving all the new tests to the earlier test suite fixes the failures with hadoop-2.7 even though why this happened is not clear.

Why are the changes needed?

The newly added tests are failing when run with hadoop profile 2.7

Does this PR introduce any user-facing change?

No

How was this patch tested?

Ran the unit tests with the default profile as well as hadoop 2.7 profile.
build/mvn test -Dtest=none -DwildcardSuites=org.apache.spark.network.yarn.YarnShuffleServiceSuite -Phadoop-2.7 -Pyarn

Run starting. Expected test count is: 11
YarnShuffleServiceSuite:
- executor state kept across NM restart
- removed applications should not be in registered executor file
- shuffle service should be robust to corrupt registered executor file
- get correct recovery path
- moving recovery file from NM local dir to recovery path
- service throws error if cannot start
- recovery db should not be created if NM recovery is not enabled
- SPARK-31646: metrics should be registered into Node Manager's metrics system
- create default merged shuffle file manager instance
- create remote block push resolver instance
- invalid class name of merge manager will use noop instance
Run completed in 2 seconds, 572 milliseconds.
Total number of tests run: 11
Suites: completed 2, aborted 0
Tests: succeeded 11, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

@otterc
Copy link
Contributor Author

otterc commented Nov 12, 2020

@mridulm Please take a look

@github-actions github-actions bot added the YARN label Nov 12, 2020
@mridulm
Copy link
Contributor

mridulm commented Nov 12, 2020

ok to test

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #130985 has finished for PR 30349 at commit c11f583.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • test(\"invalid class name of merge manager will use noop instance\")

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35591/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35591/

@tgravescs
Copy link
Contributor

it would be nice to add description as to why they need to be moved. or why moving them fixes whatever the failure is.

@HyukjinKwon
Copy link
Member

Big +1 for @tgravescs advice.

@otterc
Copy link
Contributor Author

otterc commented Nov 12, 2020

@tgravescs @HyukjinKwon I am not entirely sure why these tests would fail with the errors below only for hadoop-2.7 profile.
I see that the network-yarn module is build differently. The target jar of this module is a spark-{version}-yarn-shuffle.jar. However, that doesn't explain why this will fail only with hadoop-2.7 profile and not for hadoop 3.2. It could also be what @mridulm pointed out that some jar is missing from hadoop-2.7 profile. However, I didn't add any additional dependencies. I didn't create a new logger instance. Just using the existing logger instance in YarnShuffleService.

The reason I am deleting this common/network-yarn/src/test/java/org/apache/spark/network/yarn/YarnShuffleServiceSuite.java was that I added this with #30062 because I missed that resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala existed. I reckon there is no need to have 2 test classes for same class. However, it would have been better to have the test class in the same module.

I can dig deeper into why this happened but that would require more time. In the meantime just wanted to fix the failures with hadoop-2.7 build.

[ERROR] testCreateDefaultMergedShuffleFileManagerInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0.421 s  <<< ERROR!
java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)

[ERROR] testCreateRemoteBlockPushResolverInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0 s  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.network.yarn.YarnShuffleService
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateRemoteBlockPushResolverInstance(YarnShuffleServiceSuite.java:47)

[ERROR] testInvalidClassNameOfMergeManagerWillUseNoOpInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0.001 s  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.network.yarn.YarnShuffleService
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testInvalidClassNameOfMergeManagerWillUseNoOpInstance(YarnShuffleServiceSuite.java:57)

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32916][SHUFFLE] Remove the newly added YarnShuffleServiceSuite.java [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Remove the newly added YarnShuffleServiceSuite.java Nov 12, 2020
@dongjoon-hyun
Copy link
Member

Retest this please

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 12, 2020

Let's see if this recovers Hadoop 2.7 on Maven build.

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35625/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35625/

@SparkQA
Copy link

SparkQA commented Nov 13, 2020

Test build #131019 has finished for PR 30349 at commit c11f583.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • test(\"invalid class name of merge manager will use noop instance\")

@dongjoon-hyun
Copy link
Member

Hi, @otterc , @mridulm , @tgravescs , @HyukjinKwon .
Thank you for making an effort to recover master branch quickly.
I'll leave this PR to you guys and hope the recovery occurs soon.

@tgravescs
Copy link
Contributor

I'm fine to get something checked in quickly to fix the build but would like to understand it as well. I kicked the build again as failure didn't look related.

@tgravescs
Copy link
Contributor

yeah I didn't realize that Suite was in different module, it makes sense it put it all in one place.
it looks like ./common/network-yarn doesn't have dependencies for yarn (only Hadoop client) whereas ./resource-managers/yarn does. the tests aren't doing much with yarn it just calling your static function for newMergedShuffleFileManagerInstance and it builds with just the client so it seems like that should be enough. If tests pass I think its fine to merge as it makes sense just to combine into single file.

@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Nov 13, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35675/

@SparkQA
Copy link

SparkQA commented Nov 13, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35675/

@tgravescs
Copy link
Contributor

failures look unrelated, I'm honestly not sure what they are:
2020-11-13T15:55:21.9268572Z [error] running /home/runner/work/spark/spark/build/sbt -Phadoop-3.2 -Phive-2.3 -Dtest.include.tags=org.apache.spark.tags.ExtendedSQLTest sql/test ; received return code 1
2020-11-13T15:55:21.9487619Z ##[error]Process completed with exit code 18.
2020-11-13T15:55:22.0775315Z ##[group]Run actions/upload-artifact@v2

@dongjoon-hyun have you been this before? Maybe I'm missing something in the logs

@SparkQA
Copy link

SparkQA commented Nov 13, 2020

Test build #131071 has finished for PR 30349 at commit c11f583.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • test(\"invalid class name of merge manager will use noop instance\")

@tgravescs
Copy link
Contributor

I'm not sure what is going on with GitHub Action tests, but based on the Jenkins ones passing, I'm going to merge this. If we find it makes things worse we can revert it.

@asfgit asfgit closed this in 423ba5a Nov 13, 2020
@tgravescs
Copy link
Contributor

merged to master

@dongjoon-hyun
Copy link
Member

Thank you, @tgravescs and all!

@mridulm
Copy link
Contributor

mridulm commented Nov 14, 2020

Thanks for merging it @tgravescs.
Thanks for fixing this @otterc, and thanks for reporting it @dongjoon-hyun !

@otterc otterc deleted the SPARK-32916-followup branch November 18, 2020 04:03
wangyum pushed a commit that referenced this pull request May 26, 2023
…dded YarnShuffleServiceSuite.java

This is a follow-up fix for the failing tests in `YarnShuffleServiceSuite.java`. This java class was introduced in #30062. The tests in the class fail when run with hadoop-2.7 profile:
```
[ERROR] testCreateDefaultMergedShuffleFileManagerInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0.627 s  <<< ERROR!
java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)

[ERROR] testCreateRemoteBlockPushResolverInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0 s  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.network.yarn.YarnShuffleService
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateRemoteBlockPushResolverInstance(YarnShuffleServiceSuite.java:47)

[ERROR] testInvalidClassNameOfMergeManagerWillUseNoOpInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)  Time elapsed: 0.001 s  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.network.yarn.YarnShuffleService
	at org.apache.spark.network.yarn.YarnShuffleServiceSuite.testInvalidClassNameOfMergeManagerWillUseNoOpInstance(YarnShuffleServiceSuite.java:57)
```
A test suit for `YarnShuffleService` did exist here:
`resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala`
I missed this when I created `common/network-yarn/src/test/java/org/apache/spark/network/yarn/YarnShuffleServiceSuite.java`. Moving all the new tests to the earlier test suite fixes the failures with hadoop-2.7 even though why this happened is not clear.

The newly added tests are failing when run with hadoop profile 2.7

No

Ran the unit tests with the default profile as well as hadoop 2.7 profile.
`build/mvn test -Dtest=none -DwildcardSuites=org.apache.spark.network.yarn.YarnShuffleServiceSuite -Phadoop-2.7 -Pyarn`
```
Run starting. Expected test count is: 11
YarnShuffleServiceSuite:
- executor state kept across NM restart
- removed applications should not be in registered executor file
- shuffle service should be robust to corrupt registered executor file
- get correct recovery path
- moving recovery file from NM local dir to recovery path
- service throws error if cannot start
- recovery db should not be created if NM recovery is not enabled
- SPARK-31646: metrics should be registered into Node Manager's metrics system
- create default merged shuffle file manager instance
- create remote block push resolver instance
- invalid class name of merge manager will use noop instance
Run completed in 2 seconds, 572 milliseconds.
Total number of tests run: 11
Suites: completed 2, aborted 0
Tests: succeeded 11, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #30349 from otterc/SPARK-32916-followup.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants