Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3971][SQL] Backport #2843 to branch-1.1 #3113

Closed
wants to merge 7 commits into from

Conversation

liancheng
Copy link
Contributor

This PR backports #2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns 0.12.0 when spark.sql.hive.version is queried.

6 other commits on which #2843 depends were also backported, they are:

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

QA tests have started for PR 3113 at commit 72bffb1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

QA tests have finished for PR 3113 at commit 72bffb1.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22940/
Test FAILed.

@liancheng
Copy link
Contributor Author

Couldn't reproduce the test failure above, doubt whether it's sensitive to execution order.

@liancheng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

QA tests have started for PR 3113 at commit 72bffb1.

  • This patch merges cleanly.

liancheng and others added 4 commits November 6, 2014 15:01
…riftServer2

`HiveThriftServer2` creates a global singleton `SessionState` instance and overrides `HiveContext` to inject the `SessionState` object. This messes up `SessionState` initialization and causes problems.

This PR replaces the global `SessionState` with `HiveContext.sessionState` to avoid the initialization conflict. Also `HiveContext` reuses existing started `SessionState` if any (this is required by `SparkSQLCLIDriver`, which uses specialized `CliSessionState`).

Author: Cheng Lian <lian@databricks.com>

Closes apache#2887 from liancheng/spark-4037 and squashes the following commits:

8446675 [Cheng Lian] Removes redundant Driver initialization
a28fef5 [Cheng Lian] Avoid starting HiveContext.sessionState multiple times
49b1c5b [Cheng Lian] Reuses existing started SessionState if any
3cd6fab [Cheng Lian] Fixes SPARK-4037

Conflicts:
	sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
	sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
As scwf pointed out, `HiveThriftServer2Suite` isn't effective anymore after the Thrift server was made a daemon. On the other hand, these test suites were known flaky, PR apache#2214 tried to fix them but failed because of unknown Jenkins build error. This PR fixes both sets of issues.

In this PR, instead of watching `start-thriftserver.sh` output, the test code start a `tail` process to watch the log file. A `Thread.sleep` has to be introduced because the `kill` command used in `stop-thriftserver.sh` is not synchronous.

As for the root cause of the mysterious Jenkins build failure. Please refer to [this comment](apache#2675 (comment)) below for details.

----

(Copied from PR description of apache#2214)

This PR fixes two issues of `HiveThriftServer2Suite` and brings 1 enhancement:

1. Although metastore, warehouse directories and listening port are randomly chosen, all test cases share the same configuration. Due to parallel test execution, one of the two test case is doomed to fail
2. We caught any exceptions thrown from a test case and print diagnosis information, but forgot to re-throw the exception...
3. When the forked server process ends prematurely (e.g., fails to start), the `serverRunning` promise is completed with a failure, preventing the test code to keep waiting until timeout.

So, embarrassingly, this test suite was failing continuously for several days but no one had ever noticed it... Fortunately no bugs in the production code were covered under the hood.

Author: Cheng Lian <lian.cs.zju@gmail.com>
Author: wangfei <wangfei1@huawei.com>

Closes apache#2675 from liancheng/fix-thriftserver-tests and squashes the following commits:

1c384b7 [Cheng Lian] Minor code cleanup, restore the logging level hack in TestHive.scala
7805c33 [wangfei]  reset SPARK_TESTING to avoid loading Log4J configurations in testing class paths
af2b5a9 [Cheng Lian] Removes log level hacks from TestHiveContext
d116405 [wangfei] make sure that log4j level is INFO
ee92a82 [Cheng Lian] Relaxes timeout
7fd6757 [Cheng Lian] Fixes test suites in hive-thriftserver

Conflicts:
	sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
The removed `Future` was used to end the test case as soon as the Spark SQL CLI process exits. When the process exits prematurely, this mechanism prevents the test case to wait until timeout. But it also creates a race condition: when `foundAllExpectedAnswers.tryFailure` is called, there are chances that the last expected output line of the CLI process hasn't been caught by the main logics of the test code, thus fails the test case.

Removing this `Future` doesn't affect correctness.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#2823 from liancheng/clean-clisuite and squashes the following commits:

489a97c [Cheng Lian] Fixes the race condition that may cause test failure
`CliSuite` has been flaky for a while, this PR tries to improve this situation by fixing a race condition in `CliSuite`. The `captureOutput` function is used to capture both stdout and stderr output of the forked external process in two background threads and search for expected strings, but wasn't been properly synchronized before.

Author: Cheng Lian <lian@databricks.com>

Closes apache#3060 from liancheng/fix-cli-suite and squashes the following commits:

a70569c [Cheng Lian] Fixes race condition in CliSuite
@SparkQA
Copy link

SparkQA commented Nov 6, 2014

QA tests have finished for PR 3113 at commit 72bffb1.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22988/
Test FAILed.

@marmbrus
Copy link
Contributor

marmbrus commented Nov 7, 2014

@liancheng ping

@scwf
Copy link
Contributor

scwf commented Nov 7, 2014

@liancheng and I locally test is ok for this. retest this again?

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

QA tests have started for PR 3113 at commit 72bffb1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

QA tests have finished for PR 3113 at commit 72bffb1.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@scwf
Copy link
Contributor

scwf commented Nov 8, 2014

@liancheng, please refer to #2164, I think there is some relevance

@liancheng
Copy link
Contributor Author

@scwf Thanks for the clue!

@marmbrus Unfortunately I'm traveling today, will continue working on this ASAP tomorrow morning.

marmbrus and others added 3 commits November 9, 2014 16:36
Author: Michael Armbrust <michael@databricks.com>

Closes apache#2164 from marmbrus/shufflePartitions and squashes the following commits:

0da1e8c [Michael Armbrust] test hax
ef2d985 [Michael Armbrust] more test hacks.
2dabae3 [Michael Armbrust] more test fixes
0bdbf21 [Michael Armbrust] Make parquet tests less order dependent
b42eeab [Michael Armbrust] increase test parallelism
80453d5 [Michael Armbrust] Decrease partitions when testing
…text is created.

This will allow us to take advantage of things like the spark.defaults file.

Author: Michael Armbrust <michael@databricks.com>

Closes apache#2493 from marmbrus/copySparkConf and squashes the following commits:

0bd1377 [Michael Armbrust] Copy SQL configuration from SparkConf when a SQLContext is created.

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/test/TestSQLContext.scala

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/test/TestSQLContext.scala
@SparkQA
Copy link

SparkQA commented Nov 9, 2014

QA tests have started for PR 3113 at commit d354161.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 9, 2014

QA tests have finished for PR 3113 at commit d354161.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23117/
Test PASSed.

@liancheng
Copy link
Contributor Author

@marmbrus Backported #2164 to fix the Jenkins build failure (ParquetQuerySuite). Should be ready to go.

@liancheng
Copy link
Contributor Author

@marmbrus However, I didn't quite get why #2164 fixes those Parquet tests. Especially why did you say the original test cases are "order dependent"?

@@ -1551,7 +1550,6 @@ private[spark] object Utils extends Logging {
"%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n")
PropertyConfigurator.configure(pro)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file touched only for whitespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these changes were introduced while resolving merge conflicts. Will revert them.

@marmbrus
Copy link
Contributor

@liancheng the original test cases did not have an order by and yet expected the answers to come out in some order. This broke when we switched the TestSQLContext to local[2].

I'm going to merge this and try to manually roll back the whitespace changed file in core.

marmbrus pushed a commit to marmbrus/spark that referenced this pull request Nov 11, 2014
This PR backports apache#2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns `0.12.0` when `spark.sql.hive.version` is queried.

6 other commits on which apache#2843 depends were also backported, they are:

- apache#2887 for `SessionState` lifecycle control
- apache#2675, apache#2823 & apache#3060 for major test suite refactoring and bug fixes
- apache#2164, for Parquet test suites updates
- apache#2493, for reading `spark.sql.*` configurations

Author: Cheng Lian <lian@databricks.com>
Author: Cheng Lian <lian.cs.zju@gmail.com>
Author: Michael Armbrust <michael@databricks.com>

Closes apache#3113 from liancheng/get-info-for-1.1 and squashes the following commits:

d354161 [Cheng Lian] Provides Spark and Hive version in HiveThriftServer2 for branch-1.1
0c2a244 [Michael Armbrust] [SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLContext is created.
3202a36 [Michael Armbrust] [SQL] Decrease partitions when testing
7f395b7 [Cheng Lian] [SQL] Fixes race condition in CliSuite
0dd28ec [Cheng Lian] [SQL] Fixes the race condition that may cause test failure
5928b39 [Cheng Lian] [SPARK-3809][SQL] Fixes test suites in hive-thriftserver
faeca62 [Cheng Lian] [SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2
@marmbrus
Copy link
Contributor

This has been merged, can you close it?

@liancheng liancheng closed this Nov 13, 2014
@liancheng liancheng deleted the get-info-for-1.1 branch November 13, 2014 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants