Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34710][SQL] Add tableType column for SHOW TABLES to distinguish view and tables #31804

Closed
wants to merge 13 commits into from
Original file line number Diff line number Diff line change
Expand Up @@ -514,7 +514,8 @@ object ShowTables {
def getOutputAttrs: Seq[Attribute] = Seq(
AttributeReference("namespace", StringType, nullable = false)(),
AttributeReference("tableName", StringType, nullable = false)(),
AttributeReference("isTemporary", BooleanType, nullable = false)())
AttributeReference("isTemporary", BooleanType, nullable = false)(),
AttributeReference("isView", BooleanType, nullable = false)())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new column isView: Boolean is more efficient, but I'm wondering if a new column tableType: String is more user-friendly. The value can be TABLE or VIEW.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, most modern databases store tableType as a string column in INFORMATION_SCHEMA.TABLES.

On the reading side, they also return a string value for it.

FYI, https://docs.google.com/spreadsheets/d/1LeHYbGCDjgr-rYwQMlHBhJbeDxjOVuUatozgvKdxq6Y/edit#gid=0

Besides, as SHOW TABLES is not ANSI standard, so it might be good for us to follow the JDBC standard. Then we can make our command and JDBC meta operation consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBC protocol also use table type, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, another benefit for using string is if we decide to subdivide tables to something like SYSTEM TABLE/BASE TABLE, xxx TABLE e.t.c, we won't break then

}

/**
Expand All @@ -533,7 +534,8 @@ object ShowTableExtended {
AttributeReference("namespace", StringType, nullable = false)(),
AttributeReference("tableName", StringType, nullable = false)(),
AttributeReference("isTemporary", BooleanType, nullable = false)(),
AttributeReference("information", StringType, nullable = false)())
AttributeReference("information", StringType, nullable = false)(),
AttributeReference("isView", BooleanType, nullable = false)())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it added at the last to reduce the breaking change effect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,8 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager)

case ShowTables(DatabaseInSessionCatalog(db), pattern, output) =>
val newOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
assert(output.length == 3)
output.head.withName("database") +: output.tail
assert(output.length == 4)
output.head.withName("database") +: output.slice(1, 3)
} else {
output
}
Expand All @@ -366,13 +366,13 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager)
partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _))),
output) =>
val newOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
assert(output.length == 4)
output.head.withName("database") +: output.tail
assert(output.length == 5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this seems inconsistent with the doc. The current document means spark.sql.legacy.keepCommandOutputSchema means the 3.1 or earlier schema, doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a new legacy conf for this behavior change seems kind of trivial and might bring cognition burdens for users. So the config is reused for now and the doc will be updated if it is the right way to go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not reused technically. If we reuse the existing conf, this should be output.length == 4 because it disable this PR and the previous commit simultaneously.

So the config is reused for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be output.length == 4

indeed, this is true. output.head.withName("database") +: output.slice(1, 4) will cut the isView off

output.head.withName("database") +: output.slice(1, 4)
} else {
output
}
val tablePartitionSpec = partitionSpec.map(_.asInstanceOf[UnresolvedPartitionSpec].spec)
ShowTablesCommand(Some(db), Some(pattern), newOutput, true, tablePartitionSpec)
ShowTablesCommand(Some(db), Some(pattern), newOutput, isExtended = true, tablePartitionSpec)

// ANALYZE TABLE works on permanent views if the views are cached.
case AnalyzeTable(ResolvedV1TableOrViewIdentifier(ident), partitionSpec, noScan) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -843,11 +843,22 @@ case class ShowTablesCommand(
val database = tableIdent.database.getOrElse("")
val tableName = tableIdent.table
val isTemp = catalog.isTempView(tableIdent)
val catalogTable = catalog.getTempViewOrPermanentTableMetadata(tableIdent)
val isView = catalogTable.tableType == CatalogTableType.VIEW
if (isExtended) {
val information = catalog.getTempViewOrPermanentTableMetadata(tableIdent).simpleString
Row(database, tableName, isTemp, s"$information\n")
val information = catalogTable.simpleString
if (output.size == 5) {
Row(database, tableName, isTemp, s"$information\n", isView)
} else {
Row(database, tableName, isTemp, s"$information\n")
}
} else {
Row(database, tableName, isTemp)
if (output.size == 4) {
Row(database, tableName, isTemp, isView)
} else {
Row(database, tableName, isTemp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have a test coverage for this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the new show-tables-legacy.sql will import the corresponding tests to cover. I can add some cases in v1.ShowTablesSuite if show-tables-legacy.sql is unintuitive

}

}
}
} else {
Expand All @@ -870,7 +881,12 @@ case class ShowTablesCommand(
val tableName = tableIdent.table
val isTemp = catalog.isTempView(tableIdent)
val information = partition.simpleString
Seq(Row(database, tableName, isTemp, s"$information\n"))
if (output.size == 5) {
val isView = table.tableType == CatalogTableType.VIEW
Seq(Row(database, tableName, isTemp, s"$information\n", isView))
} else {
Seq(Row(database, tableName, isTemp, s"$information\n"))
}
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ case class ShowTablesExec(
val tables = catalog.listTables(namespace.toArray)
tables.map { table =>
if (pattern.map(StringUtils.filterPattern(Seq(table.name()), _).nonEmpty).getOrElse(true)) {
rows += toCatalystRow(table.namespace().quoted, table.name(), false)
rows += toCatalystRow(table.namespace().quoted, table.name(), false, false)
}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
--SET spark.sql.legacy.keepCommandOutputSchema=true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be useful to verify the legacy schema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it relevant to this PR's isView column addition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code change is related, spark.sql.legacy.keepCommandOutputSchema=true will omit the isView column

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--IMPORT show-tables.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 27


-- !query
CREATE DATABASE showdb
-- !query schema
struct<>
-- !query output



-- !query
USE showdb
-- !query schema
struct<>
-- !query output



-- !query
CREATE TABLE show_t1(a String, b Int, c String, d String) USING parquet PARTITIONED BY (c, d)
-- !query schema
struct<>
-- !query output



-- !query
ALTER TABLE show_t1 ADD PARTITION (c='Us', d=1)
-- !query schema
struct<>
-- !query output



-- !query
CREATE TABLE show_t2(b String, d Int) USING parquet
-- !query schema
struct<>
-- !query output



-- !query
CREATE TEMPORARY VIEW show_t3(e int) USING parquet
-- !query schema
struct<>
-- !query output



-- !query
CREATE GLOBAL TEMP VIEW show_t4 AS SELECT 1 as col1
-- !query schema
struct<>
-- !query output



-- !query
SHOW TABLES
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean>
-- !query output
show_t1
show_t2
show_t3


-- !query
SHOW TABLES IN showdb
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean>
-- !query output
show_t1
show_t2
show_t3


-- !query
SHOW TABLES 'show_t*'
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean>
-- !query output
show_t1
show_t2
show_t3


-- !query
SHOW TABLES LIKE 'show_t1*|show_t2*'
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean>
-- !query output
show_t1
show_t2


-- !query
SHOW TABLES IN showdb 'show_t*'
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean>
-- !query output
show_t1
show_t2
show_t3


-- !query
SHOW TABLES IN showdb LIKE 'show_t*'
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean>
-- !query output
show_t1
show_t2
show_t3


-- !query
SHOW TABLE EXTENDED LIKE 'show_t*'
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean,information:string>
-- !query output
show_t3 true Table: show_t3
Created Time [not included in comparison]
Last Access [not included in comparison]
Created By [not included in comparison]
Type: VIEW
Schema: root
|-- e: integer (nullable = true)


showdb show_t1 false Database: showdb
Table: show_t1
Created Time [not included in comparison]
Last Access [not included in comparison]
Created By [not included in comparison]
Type: MANAGED
Provider: parquet
Location [not included in comparison]/{warehouse_dir}/showdb.db/show_t1
Partition Provider: Catalog
Partition Columns: [`c`, `d`]
Schema: root
|-- a: string (nullable = true)
|-- b: integer (nullable = true)
|-- c: string (nullable = true)
|-- d: string (nullable = true)


showdb show_t2 false Database: showdb
Table: show_t2
Created Time [not included in comparison]
Last Access [not included in comparison]
Created By [not included in comparison]
Type: MANAGED
Provider: parquet
Location [not included in comparison]/{warehouse_dir}/showdb.db/show_t2
Schema: root
|-- b: string (nullable = true)
|-- d: integer (nullable = true)


-- !query
SHOW TABLE EXTENDED
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

mismatched input '<EOF>' expecting {'FROM', 'IN', 'LIKE'}(line 1, pos 19)

== SQL ==
SHOW TABLE EXTENDED
-------------------^^^


-- !query
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us', d=1)
-- !query schema
struct<database:string,tableName:string,isTemporary:boolean,information:string>
-- !query output
showdb show_t1 false Partition Values: [c=Us, d=1]
Location [not included in comparison]/{warehouse_dir}/showdb.db/show_t1/c=Us/d=1
Created Time [not included in comparison]
Last Access [not included in comparison]


-- !query
SHOW TABLE EXTENDED PARTITION(c='Us', d=1)
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

mismatched input 'PARTITION' expecting {'FROM', 'IN', 'LIKE'}(line 1, pos 20)

== SQL ==
SHOW TABLE EXTENDED PARTITION(c='Us', d=1)
--------------------^^^


-- !query
SHOW TABLE EXTENDED LIKE 'show_t*' PARTITION(c='Us', d=1)
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.analysis.NoSuchTableException
Table or view 'show_t*' not found in database 'showdb'


-- !query
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us')
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
Partition spec is invalid. The spec (c) must match the partition spec (c, d) defined in table '`showdb`.`show_t1`'


-- !query
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(a='Us', d=1)
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
a is not a valid partition column in table `showdb`.`show_t1`.


-- !query
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Ch', d=1)
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.analysis.NoSuchPartitionException
Partition not found in table 'show_t1' database 'showdb':
c -> Ch
d -> 1


-- !query
DROP TABLE show_t1
-- !query schema
struct<>
-- !query output



-- !query
DROP TABLE show_t2
-- !query schema
struct<>
-- !query output



-- !query
DROP VIEW show_t3
-- !query schema
struct<>
-- !query output



-- !query
DROP VIEW global_temp.show_t4
-- !query schema
struct<>
-- !query output



-- !query
USE default
-- !query schema
struct<>
-- !query output



-- !query
DROP DATABASE showdb
-- !query schema
struct<>
-- !query output

Loading