Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34710][SQL] Add tableType column for SHOW TABLES to distinguish view and tables #31804

Closed
wants to merge 13 commits into from
2 changes: 1 addition & 1 deletion docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ license: |

- In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`.

- In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. To restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.
- In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean, tableType: string`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs and `tableType` for both v1 and v2 catalogs. To restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.

- In Spark 3.2, the output schema of `SHOW TABLE EXTENDED` becomes `namespace: string, tableName: string, isTemporary: boolean, information: string`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and no change for the v2 catalogs. To restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.

Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/sql/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ def tables(self, dbName=None):
>>> sqlContext.registerDataFrameAsTable(df, "table1")
>>> df2 = sqlContext.tables()
>>> df2.filter("tableName = 'table1'").first()
Row(namespace='', tableName='table1', isTemporary=True)
Row(namespace='', tableName='table1', isTemporary=True, tableType='VIEW')
"""
if dbName is None:
return DataFrame(self._ssql_ctx.tables(), self)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -742,6 +742,13 @@ object CatalogTableType {
val VIEW = new CatalogTableType("VIEW")

val tableTypes = Seq(EXTERNAL, MANAGED, VIEW)

def classicTableTypeString(tableType: CatalogTableType): String = tableType match {
case EXTERNAL | MANAGED => "TABLE"
case VIEW => "VIEW"
case t =>
throw new IllegalArgumentException(s"Unknown table type is found: $t")
}
}


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,13 @@ object ShowTables {
def getOutputAttrs: Seq[Attribute] = Seq(
AttributeReference("namespace", StringType, nullable = false)(),
AttributeReference("tableName", StringType, nullable = false)(),
AttributeReference("isTemporary", BooleanType, nullable = false)())
AttributeReference("isTemporary", BooleanType, nullable = false)(),
AttributeReference("tableType", StringType, nullable = false)())

def getLegacyOutputAttrs: Seq[Attribute] = {
val output = getOutputAttrs
output.head.withName("database") +: output.slice(1, output.length - 1)
}
}

/**
Expand All @@ -534,7 +540,13 @@ object ShowTableExtended {
AttributeReference("namespace", StringType, nullable = false)(),
AttributeReference("tableName", StringType, nullable = false)(),
AttributeReference("isTemporary", BooleanType, nullable = false)(),
AttributeReference("information", StringType, nullable = false)())
AttributeReference("information", StringType, nullable = false)(),
AttributeReference("tableType", StringType, nullable = false)())

def getLegacyOutputAttrs: Seq[Attribute] = {
val output = getOutputAttrs
output.head.withName("database") +: output.slice(1, output.length - 1)
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3095,7 +3095,7 @@ object SQLConf {
buildConf("spark.sql.legacy.keepCommandOutputSchema")
.internal()
.doc("When true, Spark will keep the output schema of commands such as SHOW DATABASES " +
"unchanged, for v1 catalog and/or table.")
"as same as Spark 3.0 and earlier, for v1 catalog and/or table.")
.version("3.0.2")
.booleanConf
.createWithDefault(false)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,8 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager)

case ShowTables(DatabaseInSessionCatalog(db), pattern, output) =>
val newOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
assert(output.length == 3)
output.head.withName("database") +: output.tail
assert(output.length == 4)
ShowTables.getLegacyOutputAttrs
} else {
output
}
Expand All @@ -366,8 +366,8 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager)
partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _))),
output) =>
val newOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
assert(output.length == 4)
output.head.withName("database") +: output.tail
assert(output.length == 5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this seems inconsistent with the doc. The current document means spark.sql.legacy.keepCommandOutputSchema means the 3.1 or earlier schema, doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a new legacy conf for this behavior change seems kind of trivial and might bring cognition burdens for users. So the config is reused for now and the doc will be updated if it is the right way to go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not reused technically. If we reuse the existing conf, this should be output.length == 4 because it disable this PR and the previous commit simultaneously.

So the config is reused for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be output.length == 4

indeed, this is true. output.head.withName("database") +: output.slice(1, 4) will cut the isView off

ShowTableExtended.getLegacyOutputAttrs
} else {
output
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -843,11 +843,22 @@ case class ShowTablesCommand(
val database = tableIdent.database.getOrElse("")
val tableName = tableIdent.table
val isTemp = catalog.isTempView(tableIdent)
val catalogTable = catalog.getTempViewOrPermanentTableMetadata(tableIdent)
val tableType = classicTableTypeString(catalogTable.tableType)
if (isExtended) {
val information = catalog.getTempViewOrPermanentTableMetadata(tableIdent).simpleString
Row(database, tableName, isTemp, s"$information\n")
val information = catalogTable.simpleString
if (output.size == 5) {
Row(database, tableName, isTemp, s"$information\n", tableType)
} else {
Row(database, tableName, isTemp, s"$information\n")
}
} else {
Row(database, tableName, isTemp)
if (output.size == 4) {
Row(database, tableName, isTemp, tableType)
} else {
Row(database, tableName, isTemp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have a test coverage for this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the new show-tables-legacy.sql will import the corresponding tests to cover. I can add some cases in v1.ShowTablesSuite if show-tables-legacy.sql is unintuitive

}

}
}
} else {
Expand All @@ -870,7 +881,12 @@ case class ShowTablesCommand(
val tableName = tableIdent.table
val isTemp = catalog.isTempView(tableIdent)
val information = partition.simpleString
Seq(Row(database, tableName, isTemp, s"$information\n"))
if (output.size == 5) {
val tableType = classicTableTypeString(table.tableType)
Seq(Row(database, tableName, isTemp, s"$information\n", tableType))
} else {
Seq(Row(database, tableName, isTemp, s"$information\n"))
}
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ case class ShowTablesExec(
val tables = catalog.listTables(namespace.toArray)
tables.map { table =>
if (pattern.map(StringUtils.filterPattern(Seq(table.name()), _).nonEmpty).getOrElse(true)) {
rows += toCatalystRow(table.namespace().quoted, table.name(), false)
rows += toCatalystRow(table.namespace().quoted, table.name(), false, "TABLE")
}
}

Expand Down
2 changes: 1 addition & 1 deletion sql/core/src/test/resources/log4j.properties
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ log4j.rootLogger=INFO, CA, FA
log4j.appender.CA=org.apache.log4j.ConsoleAppender
log4j.appender.CA.layout=org.apache.log4j.PatternLayout
log4j.appender.CA.layout.ConversionPattern=%d{HH:mm:ss.SSS} %p %c: %m%n
log4j.appender.CA.Threshold = WARN
log4j.appender.CA.Threshold = FATAL
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oversized log for GA console output cause truncation fo useful test errors, andthe unit-tests.log is enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, Is it better to backport this change into the previous branches? cc: @HyukjinKwon @dongjoon-hyun

Copy link
Member Author

@yaooqinn yaooqinn Mar 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I guess so. we can mute console output for most modules. we call o.s.Assertions.intercept frequently which produces a lot of unnecessary error logs.

When we mute them, the error stacktraces for failed tests can still be kept.

I can make a separate PR to fix it if it makes senses to the CCers too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the current threshold (WARN) can help catch a bug like the following early: #31273 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the current threshold (WARN) can help catch a bug like the following early: #31273 (comment)

These warning messages still can be found in the unit-tests.logs. I don't see much difference as a warning message can still be simply ignored.

But when we encounter test failures but got omitted by the CI, it is hard for us to locate to them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay as long as unit-tests.log contains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll send a PR then

log4j.appender.CA.follow = true


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
--SET spark.sql.legacy.keepCommandOutputSchema=true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be useful to verify the legacy schema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it relevant to this PR's isView column addition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code change is related, spark.sql.legacy.keepCommandOutputSchema=true will omit the isView column

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--IMPORT show-tables.sql
Loading