Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Reject type changes on columns referenced by constraints/generated columns #2881

Conversation

johanl-db
Copy link
Collaborator

What changes were proposed in this pull request?

It is generally not safe to change the type of a column or field that is referenced by a CHECK constraint or a generated column. For example, some functions may produce different results depending on the input data type, e.g. hash.

This change adds checks to fail when the type of a column or field that is referenced by a CHECK constraint or a generated column is changed:

  • using ALTER TABLE t CHANGE COLUMN col TYPE type.
  • using schema evolution, in ImplicitMetadataOperation.mergeSchema().

For the latter, a check for generated columns only was already in place in SchemaMergingUtils.mergeSchemas. That check is replaced in favor of the more generic check in ImplicitMetadataOperation which reuses existing logic already used to block column rename/drop in ALTER TABLE.

How was this patch tested?

  • Tests for rejecting type changes with CHECK constraints and generated columns added to DeltaTypeWideningSuite.
  • Existing tests for rejecting type changes in GeneratedColumnSuite are extended.
  • Tests covering the updated and newly added error classes are added to DeltaErrorsSuite

This PR introduces the following user-facing changes

The type widening table feature isn't available publicly yet, this change isn't user-facing in that regard.
This change update the following error codes:

  • _LEGACY_ERROR_TEMP_DELTA_0004 -> DELTA_CONSTRAINT_DEPENDENT_COLUMN_CHANGE
  • _LEGACY_ERROR_TEMP_DELTA_0005 -> DELTA_GENERATED_COLUMNS_DEPENDENT_COLUMN_CHANGE
    and introduced the following error code:
  • DELTA_CONSTRAINT_DATA_TYPE_MISMATCH

@johanl-db johanl-db changed the title Reject type changes on columns referenced by constraints/generated columns [Spark] Reject type changes on columns referenced by constraints/generated columns Apr 16, 2024
Copy link
Contributor

@olaky olaky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good overall, just some smaller suggestions

Copy link
Collaborator

@tomvanbussel tomvanbussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just one comment.

val columnPath = fieldPath :+ currentField.name
// check if the field to change is referenced by check constraints
val dependentConstraints =
Constraints.findDependentConstraints(sparkSession, columnPath, metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit inefficient. Won't this result in the constraints and generated columns being parsed over and over again? Probably not a big issue in practice though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but it allows unifying the checks we do here during schema evolution and the checks in ALTER TABLE so that both use the same helpers.

We can improve in the future, in this change I'd rather focus on correctness and reuse helpers that have been field-tested already.

@felipepessoto
Copy link
Contributor

If a column is non-nullable would it be rejected as well? As NOT NULL is considered an invariant? #2006.

@johanl-db
Copy link
Collaborator Author

If a column is non-nullable would it be rejected as well? As NOT NULL is considered an invariant? #2006.

That works, you can change the type of a non-nullable column. I added a test to cover this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants