[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

sureshthalamati · 2016-10-27T18:42:49Z

What changes were proposed in this pull request?

SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly.

Error:
org.h2.jdbc.JdbcSQLException: Column "ID" not found;
Source SQL statement:
SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1)

This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source.

Source SQL statement after the fix:
SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1)

How was this patch tested?

Added new test case to the JdbcSuite

SparkQA · 2016-10-27T20:28:17Z

Test build #67662 has finished for PR 15662 at commit 0944e05.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-10-27T21:12:44Z

Test failed is org.apache.spark.sql.streaming.StreamingQuerySuite, unrelated to this change. Might have been fixed in commit 79fd0cc

sureshthalamati · 2016-10-27T21:12:56Z

retest this please

SparkQA · 2016-10-27T23:17:54Z

Test build #67669 has finished for PR 15662 at commit 0944e05.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-01T19:25:26Z

@rxin @gatorsmile

gatorsmile · 2016-11-02T21:41:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

    Option(f match {
-      case EqualTo(attr, value) => s"$attr = ${compileValue(value)}"
+      case EqualTo(attr, value) => s"${dialect.quoteIdentifier(attr)} = ${compileValue(value)}"


Add a nested function in compileFilter

def quote(colName: String): String = dialect.quoteIdentifier(colName)

Then, your code changes can look cleaner.

sureshthalamati · 2016-11-03T03:39:58Z

Thank you very much for the feed back @gatorsmile . Addressed the review comments.

gatorsmile · 2016-11-03T04:05:30Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

 import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types.StructType

+


remove this empty line

I will fix it.

gatorsmile · 2016-11-03T04:08:10Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

+    assert(sql("SELECT * FROM mixedCaseCols WHERE Id <=> 2").collect().size == 1)
+    assert(sql("SELECT * FROM mixedCaseCols WHERE Name LIKE 'fr%'").collect().size == 1)
+    assert(sql("SELECT * FROM mixedCaseCols WHERE NAME LIKE '%ed'").collect().size == 1)
+    assert(sql("SELECT * FROM mixedCaseCols WHERE NAME LIKE '%re%'").collect().size == 1)


What is the purpose of the above two statements?

Those two statements test String StartsWith , and Contains filters. They are pushed to jdbc data source, and mapped to SQL LIKE expression.

I will fix the inconsistent column name in above two statements.

gatorsmile · 2016-11-03T04:17:56Z

This sounds a right and critical fix to me; otherwise we are unable to resolve the columns of predicates in the case sensitive JDBC sources.

@sureshthalamati Could you post the following exception in your PR description?

org.h2.jdbc.JdbcSQLException: Column "ID" not found; SQL statement:
SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) [42122-183]

cc @srowen Could you please check it? Any comment? Thanks!

SparkQA · 2016-11-03T05:41:17Z

Test build #68044 has finished for PR 15662 at commit 2afe990.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-03T07:36:56Z

Thank you for reviewing, @gatorsmile . Updated the PR description , and addressed all the review comments.

SparkQA · 2016-11-03T10:05:51Z

Test build #68056 has finished for PR 15662 at commit 4e22e3c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-10T19:21:44Z

@gatorsmile I addressed all the review comments , can you please take a look.

gatorsmile · 2016-11-11T23:40:05Z

@srowen Any comment on this?

gatorsmile · 2016-11-26T05:53:54Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

@@ -172,7 +172,7 @@ class JDBCSuite extends SparkFunSuite
      """.stripMargin.replaceAll("\n", " "))

    conn.prepareStatement(
-      "create table test.emp(name TEXT(32) NOT NULL," +
+      "create table test.emp(\"Name\" TEXT(32) NOT NULL," +


This is an unnecessary change, right?

gatorsmile · 2016-11-26T05:55:23Z

@sureshthalamati Could you resolve the conflict? Thanks!

…of the JDBC RDD generated sql statement

…line

… column

SparkQA · 2016-11-29T01:41:05Z

Test build #69269 has finished for PR 15662 at commit 2178e3f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-29T06:53:05Z

Thanks, @gatorsmile . Resolved the conflicts, and also added test case for empty in clause with mixed case column name.

gatorsmile · 2016-11-30T18:41:14Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

@@ -855,6 +855,8 @@ class JDBCSuite extends SparkFunSuite
    assert(sql("SELECT * FROM mixedCaseCols WHERE Name LIKE '%re%'").collect().size == 1)
    assert(sql("SELECT * FROM mixedCaseCols WHERE Name IS NULL").collect().size == 1)
    assert(sql("SELECT * FROM mixedCaseCols WHERE Name IS NOT NULL").collect().size == 2)
+    assert(sql("SELECT * FROM mixedCaseCols")
+      .filter($"Name".isin(Array[String]() : _*)).collect().size == 0)


.filter($"Name".isin(Array[String]() : _*)).collect().size == 0)

->

.filter($"Name".isin()).collect().size == 0)

Thanks , @gatorsmile . Fixed it.

gatorsmile · 2016-11-30T18:46:28Z

LGTM except a minor comment

cc @cloud-fan

SparkQA · 2016-11-30T22:10:54Z

Test build #69427 has finished for PR 15662 at commit f0d731f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-11-30T23:01:00Z

retest this please

SparkQA · 2016-12-01T01:25:59Z

Test build #69434 has finished for PR 15662 at commit f0d731f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-12-02T03:05:06Z

LGTM

…of the JDBC RDD generated sql statement ## What changes were proposed in this pull request? SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly. Error: org.h2.jdbc.JdbcSQLException: Column "ID" not found; Source SQL statement: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source. Source SQL statement after the fix: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1) ## How was this patch tested? Added new test case to the JdbcSuite Author: sureshthalamati <suresh.thalamati@gmail.com> Closes #15662 from sureshthalamati/filter_quoted_cols-SPARK-18141. (cherry picked from commit 70c5549) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

gatorsmile · 2016-12-02T03:14:48Z

Merging to master/2.1! Thanks!

…of the JDBC RDD generated sql statement ## What changes were proposed in this pull request? SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly. Error: org.h2.jdbc.JdbcSQLException: Column "ID" not found; Source SQL statement: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source. Source SQL statement after the fix: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1) ## How was this patch tested? Added new test case to the JdbcSuite Author: sureshthalamati <suresh.thalamati@gmail.com> Closes apache#15662 from sureshthalamati/filter_quoted_cols-SPARK-18141.

sureshthalamati · 2016-12-03T00:10:49Z

Thank you , @gatorsmile @cloud-fan

…of the JDBC RDD generated sql statement ## What changes were proposed in this pull request? SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly. Error: org.h2.jdbc.JdbcSQLException: Column "ID" not found; Source SQL statement: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source. Source SQL statement after the fix: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1) ## How was this patch tested? Added new test case to the JdbcSuite Author: sureshthalamati <suresh.thalamati@gmail.com> Closes apache#15662 from sureshthalamati/filter_quoted_cols-SPARK-18141.

gatorsmile reviewed Nov 2, 2016

View reviewed changes

sureshthalamati force-pushed the filter_quoted_cols-SPARK-18141 branch from 0944e05 to 2afe990 Compare November 3, 2016 03:35

gatorsmile reviewed Nov 3, 2016

View reviewed changes

gatorsmile reviewed Nov 26, 2016

View reviewed changes

sureshthalamati added 4 commits November 28, 2016 11:23

[SPARK-18141][SQL] Fix to quote column names in the predicate clause …

66f7999

…of the JDBC RDD generated sql statement

Addressed review comments. Simplified code using a nested function

5dd575e

Addressed review comments. Minor fix to test, and remove extra empty …

1765332

…line

Adding test case for in with empty value list filter using mixed case…

2178e3f

… column

sureshthalamati force-pushed the filter_quoted_cols-SPARK-18141 branch from 4e22e3c to 2178e3f Compare November 28, 2016 23:00

gatorsmile reviewed Nov 30, 2016

View reviewed changes

Addressing review comments. simplified isin test case

f0d731f

gatorsmile mentioned this pull request Nov 30, 2016

[SPARK-18593][SQL] JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL #16021

Closed

asfgit closed this in 70c5549 Dec 2, 2016

cloud-fan mentioned this pull request May 25, 2017

[SPARK-14460] [SQL] properly handling of column name contains space #12252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

sureshthalamati commented Oct 27, 2016 •

edited

Loading

SparkQA commented Oct 27, 2016

sureshthalamati commented Oct 27, 2016

sureshthalamati commented Oct 27, 2016

SparkQA commented Oct 27, 2016

sureshthalamati commented Nov 1, 2016

gatorsmile Nov 2, 2016

sureshthalamati commented Nov 3, 2016

gatorsmile Nov 3, 2016

sureshthalamati Nov 3, 2016

gatorsmile Nov 3, 2016

sureshthalamati Nov 3, 2016

gatorsmile commented Nov 3, 2016

SparkQA commented Nov 3, 2016

sureshthalamati commented Nov 3, 2016

SparkQA commented Nov 3, 2016

sureshthalamati commented Nov 10, 2016

gatorsmile commented Nov 11, 2016

gatorsmile Nov 26, 2016

gatorsmile commented Nov 26, 2016

SparkQA commented Nov 29, 2016

sureshthalamati commented Nov 29, 2016

gatorsmile Nov 30, 2016 •

edited

Loading

sureshthalamati Nov 30, 2016

gatorsmile commented Nov 30, 2016

SparkQA commented Nov 30, 2016

gatorsmile commented Nov 30, 2016

SparkQA commented Dec 1, 2016

cloud-fan commented Dec 2, 2016

gatorsmile commented Dec 2, 2016

sureshthalamati commented Dec 3, 2016

		import org.apache.spark.sql.sources._
		import org.apache.spark.sql.types.StructType

[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

Conversation

sureshthalamati commented Oct 27, 2016 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Oct 27, 2016

sureshthalamati commented Oct 27, 2016

sureshthalamati commented Oct 27, 2016

SparkQA commented Oct 27, 2016

sureshthalamati commented Nov 1, 2016

gatorsmile Nov 2, 2016

Choose a reason for hiding this comment

sureshthalamati commented Nov 3, 2016

gatorsmile Nov 3, 2016

Choose a reason for hiding this comment

sureshthalamati Nov 3, 2016

Choose a reason for hiding this comment

gatorsmile Nov 3, 2016

Choose a reason for hiding this comment

sureshthalamati Nov 3, 2016

Choose a reason for hiding this comment

gatorsmile commented Nov 3, 2016

SparkQA commented Nov 3, 2016

sureshthalamati commented Nov 3, 2016

SparkQA commented Nov 3, 2016

sureshthalamati commented Nov 10, 2016

gatorsmile commented Nov 11, 2016

gatorsmile Nov 26, 2016

Choose a reason for hiding this comment

gatorsmile commented Nov 26, 2016

SparkQA commented Nov 29, 2016

sureshthalamati commented Nov 29, 2016

gatorsmile Nov 30, 2016 • edited Loading

Choose a reason for hiding this comment

sureshthalamati Nov 30, 2016

Choose a reason for hiding this comment

gatorsmile commented Nov 30, 2016

SparkQA commented Nov 30, 2016

gatorsmile commented Nov 30, 2016

SparkQA commented Dec 1, 2016

cloud-fan commented Dec 2, 2016

gatorsmile commented Dec 2, 2016

sureshthalamati commented Dec 3, 2016

sureshthalamati commented Oct 27, 2016 •

edited

Loading

gatorsmile Nov 30, 2016 •

edited

Loading