-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Reject type changes on columns referenced by constraints/generated columns #2881
[Spark] Reject type changes on columns referenced by constraints/generated columns #2881
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good overall, just some smaller suggestions
spark/src/main/scala/org/apache/spark/sql/delta/commands/alterDeltaTableCommands.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/org/apache/spark/sql/delta/schema/ImplicitMetadataOperation.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/org/apache/spark/sql/delta/schema/ImplicitMetadataOperation.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Just one comment.
val columnPath = fieldPath :+ currentField.name | ||
// check if the field to change is referenced by check constraints | ||
val dependentConstraints = | ||
Constraints.findDependentConstraints(sparkSession, columnPath, metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit inefficient. Won't this result in the constraints and generated columns being parsed over and over again? Probably not a big issue in practice though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True but it allows unifying the checks we do here during schema evolution and the checks in ALTER TABLE so that both use the same helpers.
We can improve in the future, in this change I'd rather focus on correctness and reuse helpers that have been field-tested already.
If a column is non-nullable would it be rejected as well? As NOT NULL is considered an invariant? #2006. |
That works, you can change the type of a non-nullable column. I added a test to cover this |
1a6ac90
to
bf5e77f
Compare
What changes were proposed in this pull request?
It is generally not safe to change the type of a column or field that is referenced by a CHECK constraint or a generated column. For example, some functions may produce different results depending on the input data type, e.g.
hash
.This change adds checks to fail when the type of a column or field that is referenced by a CHECK constraint or a generated column is changed:
ALTER TABLE t CHANGE COLUMN col TYPE type
.ImplicitMetadataOperation.mergeSchema()
.For the latter, a check for generated columns only was already in place in
SchemaMergingUtils.mergeSchemas
. That check is replaced in favor of the more generic check inImplicitMetadataOperation
which reuses existing logic already used to block column rename/drop inALTER TABLE
.How was this patch tested?
DeltaTypeWideningSuite
.GeneratedColumnSuite
are extended.DeltaErrorsSuite
This PR introduces the following user-facing changes
The type widening table feature isn't available publicly yet, this change isn't user-facing in that regard.
This change update the following error codes:
_LEGACY_ERROR_TEMP_DELTA_0004
->DELTA_CONSTRAINT_DEPENDENT_COLUMN_CHANGE
_LEGACY_ERROR_TEMP_DELTA_0005
->DELTA_GENERATED_COLUMNS_DEPENDENT_COLUMN_CHANGE
and introduced the following error code:
DELTA_CONSTRAINT_DATA_TYPE_MISMATCH