Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8106] [SQL] Set derby.system.durability=test to speed up Hive compatibility tests #6651

Closed
wants to merge 1 commit into from

Conversation

JoshRosen
Copy link
Contributor

Derby has a derby.system.durability configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests.

We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change.

See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property.

@JoshRosen
Copy link
Contributor Author

By the way, here's how I found that this was a bottleneck: I created a new test suite which takes a short-running Hive test and runs it 100 times back-to-back, then profiled the suite using YourKit; RandomAccessFile writes dominated the hot spots chart.

package org.apache.spark.sql.hive.execution

import org.scalatest.Outcome

class ResetProfilingSuite extends HiveCompatibilitySuite {

  override def withFixture(test: NoArgTest): Outcome = {
    (1 to 100).map { _ =>
      super.withFixture(test)
    }.last
  }

  override def whiteList = Seq("compute_stats_empty_table")
}

@JoshRosen JoshRosen force-pushed the hive-compat-suite-speedup branch from e6011f4 to b7a08a2 Compare June 4, 2015 20:13
@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34207 has finished for PR 6651 at commit e6011f4.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeRow extends BaseMutableRow
    • public abstract class BaseMutableRow extends BaseRow implements MutableRow
    • public abstract class BaseRow implements Row
    • protected class CodeGenContext
    • abstract class BaseMutableProjection extends MutableProjection
    • class SpecificProjection extends $
    • class BaseOrdering extends Ordering[Row]
    • class SpecificOrdering extends $
    • abstract class Predicate
    • class SpecificPredicate extends $
    • abstract class BaseProject extends Projection
    • class SpecificProjection extends $
    • final class SpecificRow extends $

@JoshRosen
Copy link
Contributor Author

I think that's a spurious MiMa failure, since the first build triggered ran fine and the second build just removed the dummy comment that I added to trigger the SQL tests.

@JoshRosen
Copy link
Contributor Author

Wow, weird: it looks like the MiMa tests are causing some SQL test code to be run:

[WARN] Unable to detect inner functions for class:org.apache.spark.repl.SparkMemberHandlers.MemberDefHandler
[WARN] Unable to detect inner functions for class:org.apache.spark.sql.catalyst.CatalystTypeConverters.BigDecimalConverter
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/04 13:16:09 INFO SparkContext: Running Spark version 1.5.0-SNAPSHOT
15/06/04 13:16:09 WARN SparkConf: 
SPARK_JAVA_OPTS was detected (set to '-XX:MaxPermSize=1g -Xmx2g').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with conf/spark-defaults.conf to set defaults for an application
 - ./spark-submit with --driver-java-options to set -X options for a driver
 - spark.executor.extraJavaOptions to set -X options for executors
 - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons (master or worker)

15/06/04 13:16:09 WARN SparkConf: Setting 'spark.executor.extraJavaOptions' to '-XX:MaxPermSize=1g -Xmx2g' as a work-around.
15/06/04 13:16:09 WARN SparkConf: Setting 'spark.driver.extraJavaOptions' to '-XX:MaxPermSize=1g -Xmx2g' as a work-around.
15/06/04 13:16:09 INFO SecurityManager: Changing view acls to: jenkins
15/06/04 13:16:09 INFO SecurityManager: Changing modify acls to: jenkins
15/06/04 13:16:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jenkins); users with modify permissions: Set(jenkins)
15/06/04 13:16:09 INFO Slf4jLogger: Slf4jLogger started
15/06/04 13:16:09 INFO Remoting: Starting remoting
15/06/04 13:16:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.10.28:39213]
15/06/04 13:16:10 INFO Utils: Successfully started service 'sparkDriver' on port 39213.
15/06/04 13:16:10 INFO SparkEnv: Registering MapOutputTracker
15/06/04 13:16:10 INFO SparkEnv: Registering BlockManagerMaster
15/06/04 13:16:10 INFO DiskBlockManager: Created local directory at /tmp/spark-eb0dce41-f8ab-4a1f-8d7a-03df6b182cdb/blockmgr-1ef79039-3cb9-463d-ab68-ac60ef5d865e
15/06/04 13:16:10 INFO MemoryStore: MemoryStore started with capacity 246.0 MB
15/06/04 13:16:10 INFO HttpFileServer: HTTP File server directory is /tmp/spark-eb0dce41-f8ab-4a1f-8d7a-03df6b182cdb/httpd-99bc8592-e56a-4e04-a7fb-7fdc36cacb73
15/06/04 13:16:10 INFO HttpServer: Starting HTTP Server
15/06/04 13:16:10 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/04 13:16:10 INFO AbstractConnector: Started SocketConnector@0.0.0.0:53289
15/06/04 13:16:10 INFO Utils: Successfully started service 'HTTP file server' on port 53289.
15/06/04 13:16:10 INFO SparkEnv: Registering OutputCommitCoordinator
15/06/04 13:16:10 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/04 13:16:10 WARN AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:444)
[...]

@JoshRosen
Copy link
Contributor Author

Filed https://issues.apache.org/jira/browse/SPARK-8109 to fix the MiMa problem.

@JoshRosen JoshRosen force-pushed the hive-compat-suite-speedup branch from a6198eb to b7a08a2 Compare June 4, 2015 21:16
@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34209 has finished for PR 6651 at commit b7a08a2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeRow extends BaseMutableRow
    • public abstract class BaseMutableRow extends BaseRow implements MutableRow
    • public abstract class BaseRow implements Row
    • protected class CodeGenContext
    • abstract class BaseMutableProjection extends MutableProjection
    • class SpecificProjection extends $
    • class BaseOrdering extends Ordering[Row]
    • class SpecificOrdering extends $
    • abstract class Predicate
    • class SpecificPredicate extends $
    • abstract class BaseProject extends Projection
    • class SpecificProjection extends $
    • final class SpecificRow extends $

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34214 has finished for PR 6651 at commit b7a08a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeRow extends BaseMutableRow
    • public abstract class BaseMutableRow extends BaseRow implements MutableRow
    • public abstract class BaseRow implements Row
    • protected class CodeGenContext
    • abstract class BaseMutableProjection extends MutableProjection
    • class SpecificProjection extends $
    • class BaseOrdering extends Ordering[Row]
    • class SpecificOrdering extends $
    • abstract class Predicate
    • class SpecificPredicate extends $
    • abstract class BaseProject extends Projection
    • class SpecificProjection extends $
    • final class SpecificRow extends $

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #882 has finished for PR 6651 at commit a6198eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34213 has finished for PR 6651 at commit a6198eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct(val scalingVec: Vector) extends VectorTransformer
    • public final class UnsafeRow extends BaseMutableRow
    • public abstract class BaseMutableRow extends BaseRow implements MutableRow
    • public abstract class BaseRow implements Row
    • trait TypeCheckResult
    • case class TypeCheckFailure(message: String) extends TypeCheckResult
    • abstract class UnaryArithmetic extends UnaryExpression
    • case class UnaryMinus(child: Expression) extends UnaryArithmetic
    • case class Sqrt(child: Expression) extends UnaryArithmetic
    • case class Abs(child: Expression) extends UnaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryArithmetic
    • case class MaxOf(left: Expression, right: Expression) extends BinaryArithmetic
    • case class MinOf(left: Expression, right: Expression) extends BinaryArithmetic
    • protected class CodeGenContext
    • abstract class BaseMutableProjection extends MutableProjection
    • class SpecificProjection extends $
    • class BaseOrdering extends Ordering[Row]
    • class SpecificOrdering extends $
    • abstract class Predicate
    • class SpecificPredicate extends $
    • abstract class BaseProject extends Projection
    • class SpecificProjection extends $
    • final class SpecificRow extends $
    • case class Atan2(left: Expression, right: Expression)
    • case class Hypot(left: Expression, right: Expression)
    • case class EqualTo(left: Expression, right: Expression) extends BinaryComparison

@JoshRosen
Copy link
Contributor Author

Looks like this shaved 6+ minutes off our total test times:

Before:

image

After:

sparkpullrequestbuilder__34213_org_apache_spark_sql_hive_execution__jenkins_

@rxin
Copy link
Contributor

rxin commented Jun 5, 2015

Thanks. I've merged this.

@asfgit asfgit closed this in 74dc2a9 Jun 5, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…compatibility tests

Derby has a `derby.system.durability` configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests.

We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change.

See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property.

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#6651 from JoshRosen/hive-compat-suite-speedup and squashes the following commits:

b7a08a2 [Josh Rosen] Set derby.system.durability=test in our unit tests.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…compatibility tests

Derby has a `derby.system.durability` configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests.

We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change.

See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property.

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#6651 from JoshRosen/hive-compat-suite-speedup and squashes the following commits:

b7a08a2 [Josh Rosen] Set derby.system.durability=test in our unit tests.
asfgit pushed a commit that referenced this pull request Nov 11, 2015
…d SparkSubmit tests

This patch aims to reduce the test time and flakiness of HiveSparkSubmitSuite, SparkSubmitSuite, and CliSuite.

Key changes:

- Disable IO synchronization calls for Derby writes, since durability doesn't matter for tests. This was done for HiveCompatibilitySuite in #6651 and resulted in huge test speedups.
- Add a few missing `--conf`s to disable various Spark UIs. The CliSuite, in particular, never disabled these UIs, leaving it prone to port-contention-related flakiness.
- Fix two instances where tests defined `beforeAll()` methods which were never called because the appropriate traits were not mixed in. I updated these tests suites to extend `BeforeAndAfterEach` so that they play nicely with our `ResetSystemProperties` trait.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9623 from JoshRosen/SPARK-11647.

(cherry picked from commit 2d76e44)
Signed-off-by: Reynold Xin <rxin@databricks.com>
asfgit pushed a commit that referenced this pull request Nov 11, 2015
…d SparkSubmit tests

This patch aims to reduce the test time and flakiness of HiveSparkSubmitSuite, SparkSubmitSuite, and CliSuite.

Key changes:

- Disable IO synchronization calls for Derby writes, since durability doesn't matter for tests. This was done for HiveCompatibilitySuite in #6651 and resulted in huge test speedups.
- Add a few missing `--conf`s to disable various Spark UIs. The CliSuite, in particular, never disabled these UIs, leaving it prone to port-contention-related flakiness.
- Fix two instances where tests defined `beforeAll()` methods which were never called because the appropriate traits were not mixed in. I updated these tests suites to extend `BeforeAndAfterEach` so that they play nicely with our `ResetSystemProperties` trait.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9623 from JoshRosen/SPARK-11647.
dskrvk pushed a commit to dskrvk/spark that referenced this pull request Nov 13, 2015
…d SparkSubmit tests

This patch aims to reduce the test time and flakiness of HiveSparkSubmitSuite, SparkSubmitSuite, and CliSuite.

Key changes:

- Disable IO synchronization calls for Derby writes, since durability doesn't matter for tests. This was done for HiveCompatibilitySuite in apache#6651 and resulted in huge test speedups.
- Add a few missing `--conf`s to disable various Spark UIs. The CliSuite, in particular, never disabled these UIs, leaving it prone to port-contention-related flakiness.
- Fix two instances where tests defined `beforeAll()` methods which were never called because the appropriate traits were not mixed in. I updated these tests suites to extend `BeforeAndAfterEach` so that they play nicely with our `ResetSystemProperties` trait.

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#9623 from JoshRosen/SPARK-11647.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants