[SPARK-20331][SQL] Enhanced Hive partition pruning predicate pushdown #17633

mallman · 2017-04-14T00:20:10Z

(Link to Jira: https://issues.apache.org/jira/browse/SPARK-20331)

What changes were proposed in this pull request?

Spark 2.1 introduced scalable support for Hive tables with huge numbers of partitions. Key to leveraging this support is the ability to prune unnecessary table partitions to answer queries. Spark supports a subset of the class of partition pruning predicates that the Hive metastore supports. If a user writes a query with a partition pruning predicate that is not supported by Spark, Spark falls back to loading all partitions and pruning client-side. We want to broaden Spark's current partition pruning predicate pushdown capabilities.

One of the key missing capabilities is support for disjunctions. For example, for a table partitioned by date, writing a query with a predicate like

date = 20161011 or date = 20161014

will result in Spark fetching all partitions. For a table partitioned by date and hour, querying a range of hours across dates can be quite difficult to accomplish without fetching all partition metadata.

The current partition pruning support supports only comparisons against literals. We can expand that to foldable expressions by evaluating them at planning time.

We can also implement support for the "IN" comparison by expanding it to a sequence of "OR"s.

How was this patch tested?

The HiveClientSuite and VersionsSuite were refactored and simplified to make Hive client-based, version-specific testing more modular and conceptually simpler. There are now two Hive test suites: HiveClientSuite and HivePartitionFilteringSuite. These test suites have a single-argument constructor taking a version parameter. As such, these test suites cannot be run by themselves. Instead, they have been bundled into "aggregation" test suites which run each suite for each Hive client version. These aggregation suites are called HiveClientSuites and HivePartitionFilteringSuites. The VersionsSuite and HiveClientSuite have been refactored into each of these aggregation suites, respectively.

HiveClientSuite and HivePartitionFilteringSuite subclass a new abstract class, HiveVersionSuite. HiveVersionSuite collects functionality related to testing a single Hive version and overrides relevant test suite methods to display version-specific information.

A new trait, HiveClientVersions, has been added with a sequence of Hive test versions.

SparkQA · 2017-04-14T01:59:12Z

Test build #75783 has finished for PR 17633 at commit 1a7663d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2017-04-14T06:16:49Z

Does this work for non-Hive tables?

mallman · 2017-04-14T15:42:47Z

Does this work for non-Hive tables?

This is geared towards Hive partitioned tables. If we have another system that prunes table partitions based on a string-ified pruning predicate I'm unaware. Do you have one in mind?

rxin · 2017-04-14T17:23:33Z

Then it should work.

tejasapatil · 2017-04-16T20:32:28Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

-        s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
-    }.mkString(" and ")
+    def isFoldable(expr: Expression): Boolean =
+      (expr.dataType.isInstanceOf[IntegralType] || expr.dataType.isInstanceOf[StringType]) &&


Can this support all AtomicType's ? From my understanding these are partition columns and can support other types besides int and string.

IntegralType encompasses all "integral" types, including IntegerType, ByteType, ShortType, etc.

I'm trying to be somewhat conservative in what we support here to ensure compatibility. Is there a particular type you'd like to see supported?

mallman · 2017-04-26T19:28:28Z

@cloud-fan @ericl Hi guys. Care to review?

ericl · 2017-04-26T21:26:15Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+          s"(${convert(expr1)} or ${convert(expr2)})"
+      }
+
+    filters.flatMap(f => Try(convert(f)).toOption).mkString(" and ")


Why do we need a Try here?

The convert method throws a MatchError if an expression is not a compatible Hive partition filter. Otherwise it returns a compatible partition pruning string. While using exception handling in this way can certainly be considered an anti-pattern, it does make for a much simpler implementation and type signature of the convert method.

shall we just make convert a PartialFunction?

Seems reasonable. Will do.

So we can get rid of Try now?

for partial function, now we can do filters.flatMap(f => convert.lift(f))

or filters.collect(convert). I've not tried, but should work.

ah that's better.

Both filters.flatMap(f => convert.lift(f)) and filters.collect(convert) throw a MatchError on input

ds=(20170101 + 1) and h=0

Now I know that in practice foldable expressions such as 20170101 + 1 are converted to literals, but the point I'm making here is that the alternatives proposed here will throw a MatchError if the filter expression is parsed into a tree with a leaf that does not match a pattern defined by the convert partial function.

Hence the use of filters.flatMap(f => Try(convert(f)).toOption).mkString(" and ").

gatorsmile · 2017-04-26T22:20:06Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+      expr.foldable &&
+      expr.deterministic
+
+    def convertFoldable(expr: Expression): String = expr.dataType match {


The foldable expressions should be converted in Optimizer, right?

yea, to make the code simpler, we can just match literals.

gatorsmile · 2017-04-26T22:29:20Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+    def convert(filter: Expression): String =
+      filter match {
+        case In(a: Attribute, exprs) if exprs.forall(isFoldable) =>
+          val or = exprs.map(expr => s"${a.name} = ${convertFoldable(expr)}").reduce(_ + " or " + _)


No need to quote the column names?

Good point. You mean with backticks, like

s"`${a.name}` = ${convertFoldable(expr)}"

?

see AttributeReference.sql, we should use quoteIdentifier

In testing this in a modified form of HiveClientSuite, Hive is complaining that it can't parse the predicate. Specifically, the error message in the exception it's throwing is

Error parsing partition filter : line 1:3 no viable alternative at character '`'

The filter string it's trying to parse is `ds` = 20170101.

gatorsmile · 2017-04-26T22:31:44Z

Could you check whether there exists any limit on predicate we can pass to Hive? Are they the same for all our supported Hive megastore versions?

cloud-fan · 2017-04-27T01:11:59Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala


    assert(filteredPartitions.size == testPartitionCount)
  }
+
+  test("getPartitionsByFilter: ds=20170101") {


will these tests be executed on all supported hive versions?

will these tests be executed on all supported hive versions?

No. That's something I'll look into.

They will now. See HiveClientSuites.scala in this PR.

mallman · 2017-05-03T20:10:17Z

Hi Guys. Sorry for the lack of updates on this. I've been held up with other responsibilities the past week. I'm planning to push a new commit today or tomorrow.

mallman · 2017-05-09T21:54:54Z

I've pushed a new commit removing the logical for handling "foldables", since these are evaluated earlier in planning.

I've also removed the modifications I made to FiltersSuite.scala and modified the convertFilters method to conform to its original specification.

mallman · 2017-05-09T21:56:39Z

Could you check whether there exists any limit on predicate we can pass to Hive?

There are, and I found something in the way of documentation or a grammar a while back that specifies the accepted predicates. I'll dig that up.

Are they the same for all our supported Hive megastore versions?

I don't know yet. I have cross-version compatibility checking on my to-do list.

SparkQA · 2017-05-09T23:30:10Z

Test build #76706 has finished for PR 17633 at commit a4cdfb0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-05-10T01:44:09Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

-    }.mkString(" and ")
+    def isExtractable(expr: Expression): Boolean =
+      expr match {
+        case Literal(_, _: IntegralType) | Literal(_, _: StringType) => true


what about float/double/decimal?

I'm going to look into support for that. FWIW, it's not supported in the current codebase, so omitting it wouldn't be a regression.

I suggest we omit support for this at this time to reduce the necessary testing and reviewing footprint. We can add support for other data types as part of a new Jira issue and PR. Okay?

mallman · 2017-05-12T02:43:59Z

Hey guys. Just a quick update. I made good progress on implementing multi-version testing today, however it's not quite ready. I'm going to be on leave from tomorrow through the rest of next week, so I'm kind of doubtful I'll push anything new until May 22nd.

gatorsmile · 2017-05-12T17:17:38Z

You can add the test cases to VersionsSuite for verifying the behaviors of each Hive meta-store client version.

SparkQA · 2017-05-22T20:07:36Z

Test build #77199 has started for PR 17633 at commit 4f802a5.

mallman · 2017-05-22T20:09:03Z

You can add the test cases to VersionsSuite for verifying the behaviors of each Hive meta-store client version.

@gatorsmile, I've refactored Hive version-specific testing to make it easier to add new Hive test suites which test functionality for a collection of Hive versions. I did not feel adding these partition filtering tests to VersionsSuite.scala was a good fit.

mallman · 2017-05-22T20:11:01Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala


+  // Should this be in a beforeAll() method instead?


Should this be in a beforeAll() method instead? This actually initializes client for use by other tests. Putting that in a beforeAll() method feels more appropriate.

mallman · 2017-05-22T21:07:52Z

I think this build was aborted because of the emergency jenkins restart, as reported on the spark dev mailing list. Retest, please?

cloud-fan · 2017-05-23T08:16:26Z

retest this please

SparkQA · 2017-05-23T11:24:38Z

Test build #77242 has finished for PR 17633 at commit 4f802a5.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
class HiveClientSuite(version: String) extends HiveVersionSuite(version) with HiveClientVersions
class HiveClientSuites extends Suite with HiveClientVersions
class HivePartitionFilteringSuite(version: String)
class HivePartitionFilteringSuites extends Suite with HiveClientVersions

mallman · 2017-05-23T18:42:32Z

Rebased to resolve merge conflicts.

SparkQA · 2017-05-23T20:22:05Z

Test build #77262 has finished for PR 17633 at commit 0c4040c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class HiveClientSuite(version: String) extends HiveVersionSuite(version) with HiveClientVersions
class HiveClientSuites extends Suite with HiveClientVersions
class HivePartitionFilteringSuite(version: String)
class HivePartitionFilteringSuites extends Suite with HiveClientVersions

cloud-fan · 2017-05-24T09:40:08Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+
+    def convert(filter: Expression): String =
+      filter match {
+        case In(a: Attribute, exprs) if exprs.forall(isExtractable) =>


shall we check varcharKeys here?

Good catch. Will do.

function and add a guard when testing In expressions against varchar attributes

add another test case

cloud-fan · 2017-07-11T03:36:30Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+
+    object ExtractableLiterals {
+      def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+        exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {


I'd like it to be more java style:

val extracted = exprs.map(ExtractableLiteral.unapply) if (extracted. exists(_.isEmpty)) { None } else { extracted.map(_.get) }

Is there something wrong with the way it is now?

foldLeft may be not friendly to some Spark developers, but it's not a big deal.

cloud-fan · 2017-07-11T03:37:11Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+      }
+
+      def unapply(values: Set[Any]): Option[Seq[String]] = {
+        values.toSeq.foldLeft(Option(Seq.empty[String])) {


cloud-fan · 2017-07-11T03:41:19Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+        (convert.lift(expr1) ++ convert.lift(expr2)).mkString("(", " and ", ")")
+      case op @ Or(expr1, expr2)
+          if convert.isDefinedAt(expr1) && convert.isDefinedAt(expr2) =>
+        (convert.lift(expr1) ++ convert.lift(expr2)).mkString("(", " or ", ")")


nit: "(" + convert(expr1) + " or " + convert(expr) + ")"

cloud-fan · 2017-07-11T03:42:42Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

          if !varcharKeys.contains(a.name) =>
-        s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
-      case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
+        s"${a.name} ${op.symbol} $value"


shall we add () for binary comparisons?

Is there a problem with leaving them out?

nvm, realized that and, or have lower precedence over binary operators, so it should be fine

SparkQA · 2017-07-11T04:07:37Z

Test build #79493 has finished for PR 17633 at commit a087a0f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

to a filter string, we can safely call convert(expr) on each directly

cloud-fan · 2017-07-11T05:10:38Z

LGTM, pending jenkins

SparkQA · 2017-07-11T06:41:33Z

Test build #79501 has finished for PR 17633 at commit af3065a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-07-11T06:50:45Z

thanks, merging to master!

mallman · 2017-07-11T17:03:44Z

@cloud-fan Can you back port this PR to 2.1 and 2.2, please? I think the patch should apply cleanly.

rxin · 2017-07-11T18:49:47Z

@mallman we don't backport such risky changes to maintenance branches. Those branches typically go through much less testing.

…on pruning predicate pushdown ## What changes were proposed in this pull request? This is a follow-up PR of #17633. This PR is to add a conf `spark.sql.hive.advancedPartitionPredicatePushdown.enabled`, which can be used to turn the enhancement off. ## How was this patch tested? Add a test case Author: gatorsmile <gatorsmile@gmail.com> Closes #19547 from gatorsmile/Spark20331FollowUp.

yujhe · 2018-08-12T05:05:15Z

hi @mallman
I can run this test via this command:
build/sbt "test-only org.apache.spark.sql.hive.client.HiveClientSuites

But when I try to run this test on Intellij IDEA, I got the following error message, do you have any idea to run this test on Intellij?

Exception encountered when invoking run on a nested suite - [download failed: com.google.inject#guice;3.0!guice.jar, download failed: javax.inject#javax.inject;1!javax.inject.jar, download failed: aopalliance#aopalliance;1.0!aopalliance.jar, download failed: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle), download failed: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle), download failed: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle), download failed: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar, download failed: com.sun.jersey.contribs#jersey-guice;1.9!jersey-guice.jar]
java.lang.RuntimeException: [download failed: com.google.inject#guice;3.0!guice.jar, download failed: javax.inject#javax.inject;1!javax.inject.jar, download failed: aopalliance#aopalliance;1.0!aopalliance.jar, download failed: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle), download failed: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle), download failed: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle), download failed: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar, download failed: com.sun.jersey.contribs#jersey-guice;1.9!jersey-guice.jar]
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1303)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anonfun$2.apply(IsolatedClientLoader.scala:111)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anonfun$2.apply(IsolatedClientLoader.scala:111)
	at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:110)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:61)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:60)
	at org.apache.spark.sql.hive.client.HiveClientBuilder$.buildClient(HiveClientBuilder.scala:50)
	at org.apache.spark.sql.hive.client.HiveVersionSuite.buildClient(HiveVersionSuite.scala:39)
	at org.apache.spark.sql.hive.client.HiveClientSuite.org$apache$spark$sql$hive$client$HiveClientSuite$$init(HiveClientSuite.scala:48)
	at org.apache.spark.sql.hive.client.HiveClientSuite.beforeAll(HiveClientSuite.scala:70)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
	at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1210)
	at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1257)
	at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1255)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1255)
	at org.apache.spark.sql.hive.client.HiveClientSuites.runNestedSuites(HiveClientSuites.scala:24)
	at org.scalatest.Suite$class.run(Suite.scala:1144)
	at org.apache.spark.sql.hive.client.HiveClientSuites.run(HiveClientSuites.scala:24)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
	at org.scalatest.tools.Runner$.run(Runner.scala:850)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)

(Link to Jira: https://issues.apache.org/jira/browse/SPARK-20331) Spark 2.1 introduced scalable support for Hive tables with huge numbers of partitions. Key to leveraging this support is the ability to prune unnecessary table partitions to answer queries. Spark supports a subset of the class of partition pruning predicates that the Hive metastore supports. If a user writes a query with a partition pruning predicate that is *not* supported by Spark, Spark falls back to loading all partitions and pruning client-side. We want to broaden Spark's current partition pruning predicate pushdown capabilities. One of the key missing capabilities is support for disjunctions. For example, for a table partitioned by date, writing a query with a predicate like date = 20161011 or date = 2016101 will result in Spark fetching all partitions. For a table partitioned by date and hour, querying a range of hours across dates can be quite difficult to accomplish without fetching all partition metadata. The current partition pruning support supports only comparisons against literals. We can expand that to foldable expressions by evaluating them at planning time. We can also implement support for the "IN" comparison by expanding it to a sequence of "OR"s. The `HiveClientSuite` and `VersionsSuite` were refactored and simplified to make Hive client-based, version-specific testing more modular and conceptually simpler. There are now two Hive test suites: `HiveClientSuite` and `HivePartitionFilteringSuite`. These test suites have a single-argument constructor taking a `version` parameter. As such, these test suites cannot be run by themselves. Instead, they have been bundled into "aggregation" test suites which run each suite for each Hive client version. These aggregation suites are called `HiveClientSuites` and `HivePartitionFilteringSuites`. The `VersionsSuite` and `HiveClientSuite` have been refactored into each of these aggregation suites, respectively. `HiveClientSuite` and `HivePartitionFilteringSuite` subclass a new abstract class, `HiveVersionSuite`. `HiveVersionSuite` collects functionality related to testing a single Hive version and overrides relevant test suite methods to display version-specific information. A new trait, `HiveClientVersions`, has been added with a sequence of Hive test versions. Author: Michael Allman <michael@videoamp.com> Closes apache#17633 from mallman/spark-20331-enhanced_partition_pruning_pushdown. (cherry picked from commit a4baa8f) Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveVersionSuite.scala

…on pruning predicate pushdown ## What changes were proposed in this pull request? This is a follow-up PR of apache#17633. This PR is to add a conf `spark.sql.hive.advancedPartitionPredicatePushdown.enabled`, which can be used to turn the enhancement off. ## How was this patch tested? Add a test case Author: gatorsmile <gatorsmile@gmail.com> Closes apache#19547 from gatorsmile/Spark20331FollowUp. (cherry picked from commit d8cada8)

(Link to Jira: https://issues.apache.org/jira/browse/SPARK-20331) Spark 2.1 introduced scalable support for Hive tables with huge numbers of partitions. Key to leveraging this support is the ability to prune unnecessary table partitions to answer queries. Spark supports a subset of the class of partition pruning predicates that the Hive metastore supports. If a user writes a query with a partition pruning predicate that is *not* supported by Spark, Spark falls back to loading all partitions and pruning client-side. We want to broaden Spark's current partition pruning predicate pushdown capabilities. One of the key missing capabilities is support for disjunctions. For example, for a table partitioned by date, writing a query with a predicate like date = 20161011 or date = 2016101 will result in Spark fetching all partitions. For a table partitioned by date and hour, querying a range of hours across dates can be quite difficult to accomplish without fetching all partition metadata. The current partition pruning support supports only comparisons against literals. We can expand that to foldable expressions by evaluating them at planning time. We can also implement support for the "IN" comparison by expanding it to a sequence of "OR"s. The `HiveClientSuite` and `VersionsSuite` were refactored and simplified to make Hive client-based, version-specific testing more modular and conceptually simpler. There are now two Hive test suites: `HiveClientSuite` and `HivePartitionFilteringSuite`. These test suites have a single-argument constructor taking a `version` parameter. As such, these test suites cannot be run by themselves. Instead, they have been bundled into "aggregation" test suites which run each suite for each Hive client version. These aggregation suites are called `HiveClientSuites` and `HivePartitionFilteringSuites`. The `VersionsSuite` and `HiveClientSuite` have been refactored into each of these aggregation suites, respectively. `HiveClientSuite` and `HivePartitionFilteringSuite` subclass a new abstract class, `HiveVersionSuite`. `HiveVersionSuite` collects functionality related to testing a single Hive version and overrides relevant test suite methods to display version-specific information. A new trait, `HiveClientVersions`, has been added with a sequence of Hive test versions. Author: Michael Allman <michael@videoamp.com> Closes apache#17633 from mallman/spark-20331-enhanced_partition_pruning_pushdown. (cherry picked from commit a4baa8f) Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveVersionSuite.scala

…on pruning predicate pushdown This is a follow-up PR of apache#17633. This PR is to add a conf `spark.sql.hive.advancedPartitionPredicatePushdown.enabled`, which can be used to turn the enhancement off. Add a test case Author: gatorsmile <gatorsmile@gmail.com> Closes apache#19547 from gatorsmile/Spark20331FollowUp. (cherry picked from commit d8cada8) Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

…on pruning predicate pushdown ## What changes were proposed in this pull request? This is a follow-up PR of apache#17633. This PR is to add a conf `spark.sql.hive.advancedPartitionPredicatePushdown.enabled`, which can be used to turn the enhancement off. ## How was this patch tested? Add a test case Author: gatorsmile <gatorsmile@gmail.com> Closes apache#19547 from gatorsmile/Spark20331FollowUp.

tejasapatil reviewed Apr 16, 2017

View reviewed changes

ericl reviewed Apr 26, 2017

View reviewed changes

gatorsmile reviewed Apr 26, 2017

View reviewed changes

cloud-fan reviewed Apr 27, 2017

View reviewed changes

mallman force-pushed the spark-20331-enhanced_partition_pruning_pushdown branch from 1a7663d to a4cdfb0 Compare May 9, 2017 21:48

cloud-fan reviewed May 10, 2017

View reviewed changes

mallman force-pushed the spark-20331-enhanced_partition_pruning_pushdown branch from a4cdfb0 to 4f802a5 Compare May 22, 2017 20:05

mallman commented May 22, 2017

View reviewed changes

mallman force-pushed the spark-20331-enhanced_partition_pruning_pushdown branch from 4f802a5 to 0c4040c Compare May 23, 2017 18:41

cloud-fan reviewed May 24, 2017

View reviewed changes

Michael Allman added 4 commits July 10, 2017 19:12

Test Hive table partition filtering by Hive client version

51551af

Convert the metastore filter expression convert function to a partial

f8f0bd5

function and add a guard when testing In expressions against varchar attributes

Add ExtractableLiteral and ExtractableLiterals, fix some indentation and

2165c87

add another test case

Support nested AND expressions and InSet

a087a0f

mallman force-pushed the spark-20331-enhanced_partition_pruning_pushdown branch from d88b8ab to a087a0f Compare July 11, 2017 02:12

cloud-fan reviewed Jul 11, 2017

View reviewed changes

When dealing with an Or expression where both children are convertible

af3065a

to a filter string, we can safely call convert(expr) on each directly

asfgit closed this in a4baa8f Jul 11, 2017

gatorsmile mentioned this pull request Oct 21, 2017

[SPARK-20331][SQL][FOLLOW-UP] Add a SQLConf for enhanced Hive partition pruning predicate pushdown #19547

Closed

[SPARK-20331][SQL] Enhanced Hive partition pruning predicate pushdown #17633

[SPARK-20331][SQL] Enhanced Hive partition pruning predicate pushdown #17633

Conversation

mallman commented Apr 14, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Apr 14, 2017

rxin commented Apr 14, 2017

mallman commented Apr 14, 2017

rxin commented Apr 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mallman commented Apr 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Apr 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mallman commented May 3, 2017

mallman commented May 9, 2017

mallman commented May 9, 2017

SparkQA commented May 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mallman commented May 12, 2017

gatorsmile commented May 12, 2017

SparkQA commented May 22, 2017

mallman commented May 22, 2017

Choose a reason for hiding this comment

mallman commented May 22, 2017

cloud-fan commented May 23, 2017

SparkQA commented May 23, 2017

mallman commented May 23, 2017

SparkQA commented May 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Jul 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Jul 11, 2017 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Jul 11, 2017

cloud-fan commented Jul 11, 2017

SparkQA commented Jul 11, 2017

cloud-fan commented Jul 11, 2017

mallman commented Jul 11, 2017

rxin commented Jul 11, 2017

yujhe commented Aug 12, 2018

mallman commented Apr 14, 2017 •

edited

Loading

cloud-fan Jul 11, 2017 •

edited

Loading

cloud-fan Jul 11, 2017 •

edited

Loading