Skip to content

Commit

Permalink
[SPARK-23418][SQL] Fail DataSourceV2 reads when user schema is passed…
Browse files Browse the repository at this point in the history
…, but not supported.

DataSourceV2 initially allowed user-supplied schemas when a source doesn't implement `ReadSupportWithSchema`, as long as the schema was identical to the source's schema. This is confusing behavior because changes to an underlying table can cause a previously working job to fail with an exception that user-supplied schemas are not allowed.

This reverts commit adcb25a0624, which was added to apache#20387 so that it could be removed in a separate JIRA issue and PR.

Existing tests.

Author: Ryan Blue <blue@apache.org>

Closes apache#20603 from rdblue/SPARK-23418-revert-adcb25a0624.

Ref: LIHADOOP-48531
  • Loading branch information
rdblue authored and otterc committed Oct 15, 2019
1 parent 8c32682 commit 879f00f
Showing 1 changed file with 1 addition and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -174,13 +174,6 @@ object DataSourceV2Relation {
v2Options: DataSourceOptions,
userSchema: Option[StructType]): StructType = {
val reader = userSchema match {
// TODO: remove this case because it is confusing for users
case Some(s) if !source.isInstanceOf[ReadSupportWithSchema] =>
val reader = source.asReadSupport.createReader(v2Options)
if (reader.readSchema() != s) {
throw new AnalysisException(s"${source.name} does not allow user-specified schemas.")
}
reader
case Some(s) =>
source.asReadSupportWithSchema.createReader(s, v2Options)
case _ =>
Expand All @@ -195,11 +188,7 @@ object DataSourceV2Relation {
filters: Option[Seq[Expression]] = None,
userSpecifiedSchema: Option[StructType] = None): DataSourceV2Relation = {
val projection = schema(source, makeV2Options(options), userSpecifiedSchema).toAttributes
DataSourceV2Relation(source, options, projection, filters,
// if the source does not implement ReadSupportWithSchema, then the userSpecifiedSchema must
// be equal to the reader's schema. the schema method enforces this. because the user schema
// and the reader's schema are identical, drop the user schema.
if (source.isInstanceOf[ReadSupportWithSchema]) userSpecifiedSchema else None)
DataSourceV2Relation(source, options, projection, filters, userSpecifiedSchema)
}

private def pushRequiredColumns(reader: DataSourceReader, struct: StructType): Unit = {
Expand Down

0 comments on commit 879f00f

Please sign in to comment.