-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8106] [SQL] Set derby.system.durability=test to speed up Hive compatibility tests #6651
Conversation
By the way, here's how I found that this was a bottleneck: I created a new test suite which takes a short-running Hive test and runs it 100 times back-to-back, then profiled the suite using YourKit; RandomAccessFile writes dominated the hot spots chart. package org.apache.spark.sql.hive.execution
import org.scalatest.Outcome
class ResetProfilingSuite extends HiveCompatibilitySuite {
override def withFixture(test: NoArgTest): Outcome = {
(1 to 100).map { _ =>
super.withFixture(test)
}.last
}
override def whiteList = Seq("compute_stats_empty_table")
} |
e6011f4
to
b7a08a2
Compare
Test build #34207 has finished for PR 6651 at commit
|
I think that's a spurious MiMa failure, since the first build triggered ran fine and the second build just removed the dummy comment that I added to trigger the SQL tests. |
Wow, weird: it looks like the MiMa tests are causing some SQL test code to be run:
|
Filed https://issues.apache.org/jira/browse/SPARK-8109 to fix the MiMa problem. |
a6198eb
to
b7a08a2
Compare
Test build #34209 has finished for PR 6651 at commit
|
Test build #34214 has finished for PR 6651 at commit
|
Test build #882 has finished for PR 6651 at commit
|
Test build #34213 has finished for PR 6651 at commit
|
Thanks. I've merged this. |
…compatibility tests Derby has a `derby.system.durability` configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests. We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change. See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6651 from JoshRosen/hive-compat-suite-speedup and squashes the following commits: b7a08a2 [Josh Rosen] Set derby.system.durability=test in our unit tests.
…compatibility tests Derby has a `derby.system.durability` configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests. We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change. See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6651 from JoshRosen/hive-compat-suite-speedup and squashes the following commits: b7a08a2 [Josh Rosen] Set derby.system.durability=test in our unit tests.
…d SparkSubmit tests This patch aims to reduce the test time and flakiness of HiveSparkSubmitSuite, SparkSubmitSuite, and CliSuite. Key changes: - Disable IO synchronization calls for Derby writes, since durability doesn't matter for tests. This was done for HiveCompatibilitySuite in #6651 and resulted in huge test speedups. - Add a few missing `--conf`s to disable various Spark UIs. The CliSuite, in particular, never disabled these UIs, leaving it prone to port-contention-related flakiness. - Fix two instances where tests defined `beforeAll()` methods which were never called because the appropriate traits were not mixed in. I updated these tests suites to extend `BeforeAndAfterEach` so that they play nicely with our `ResetSystemProperties` trait. Author: Josh Rosen <joshrosen@databricks.com> Closes #9623 from JoshRosen/SPARK-11647. (cherry picked from commit 2d76e44) Signed-off-by: Reynold Xin <rxin@databricks.com>
…d SparkSubmit tests This patch aims to reduce the test time and flakiness of HiveSparkSubmitSuite, SparkSubmitSuite, and CliSuite. Key changes: - Disable IO synchronization calls for Derby writes, since durability doesn't matter for tests. This was done for HiveCompatibilitySuite in #6651 and resulted in huge test speedups. - Add a few missing `--conf`s to disable various Spark UIs. The CliSuite, in particular, never disabled these UIs, leaving it prone to port-contention-related flakiness. - Fix two instances where tests defined `beforeAll()` methods which were never called because the appropriate traits were not mixed in. I updated these tests suites to extend `BeforeAndAfterEach` so that they play nicely with our `ResetSystemProperties` trait. Author: Josh Rosen <joshrosen@databricks.com> Closes #9623 from JoshRosen/SPARK-11647.
…d SparkSubmit tests This patch aims to reduce the test time and flakiness of HiveSparkSubmitSuite, SparkSubmitSuite, and CliSuite. Key changes: - Disable IO synchronization calls for Derby writes, since durability doesn't matter for tests. This was done for HiveCompatibilitySuite in apache#6651 and resulted in huge test speedups. - Add a few missing `--conf`s to disable various Spark UIs. The CliSuite, in particular, never disabled these UIs, leaving it prone to port-contention-related flakiness. - Fix two instances where tests defined `beforeAll()` methods which were never called because the appropriate traits were not mixed in. I updated these tests suites to extend `BeforeAndAfterEach` so that they play nicely with our `ResetSystemProperties` trait. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#9623 from JoshRosen/SPARK-11647.
Derby has a
derby.system.durability
configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests.We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change.
See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property.