Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemadiff: diffing two schemas generates a rich SchemaDiff object #12551

Merged

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Mar 6, 2023

Description

This is a schemadiff enhancement. Previously, when diffing two schemas (via Schema.Diff(other)) we'd return a list of Diff objects, each of which would represent a single change, .e.g an ALTER TABLE, DROP VIEW, etc. The diffs order was partly heuristic (DROP VIEW -> DROP TABLE -> ALTER TABLE -> CREATE TABLE -> ALTER VIEW -> CREATE VIEW) and partly evaluated (order views by dependencies). However, the order was not necessarily applicable.

This PR introduces a new SchemaDiff function which returns a new SchemaDiff object:

func (s *Schema) SchemaDiff(other *Schema, hints *DiffHints) (*SchemaDiff, error)

SchemaDiff is a rich object which includes all diffs between the two schemas, but also:

  • Diffs are guaranteed to be sorted in lienarized, applicable order, if possible: you'd be able to apply the diffs one by one, and keep the schema consistent at all times while doing so, resolving any dependencies between the diffs. This includes VIEW and FOREIGN KEY dependencies.
  • There are scenarios where the declarative diffs between two schema have no single linearized sequence off diffs. This happens because we only attempt a single ALTER operation per table/view. In the future, we may allow a breakdown into granular diffs.
  • The diffs are arranged into equivalence classes, where diffs in the same equivalence class are said to depend on each other.
  • "depend on each other" has a formal definition of increasing severity or strictness. Two diffs may not have any dependency, or may have an unknown dependency (the current algorithm does not lead to any such result), an in-order completion dependency, or a strictly sequential dependency. This information is made available by SchemaDiff. This lets the user/automation that gets SchemaDiff to know whether some migrations may run --in-order, or if one migration must strictly wait for another to complete.
  • The algorithm follows MySQL's capabilities; it attempts the least strict dependency type between any two diffs, where possible. An in-order dependency type is known to be applicable in Online DDL's vitess strategy.

The implementation relies heavily on the various Validate() functionalities. The order of diffs begins with a heuristic order, which is likely to be the final order in most cases; but it may permute that order in between dependent diffs, if the heuristic order is not linearly valid.

Note: VIEW column dependencies support is provided in unmerged #12147 ; #12147 works, but we're looking to refactor it and reuse an internal implementation (UPDATE: reimplemented in #12565). The tests are prepared and commented out.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Good merger with #12565, and the tests we were waiting on exposed a bug, now fixed.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Ping for review 🙏 @dbussink in particular

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff here is due to gofmt -s -w .... the file is not formatted correctly in main.

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I only had minor nits/comments/questions.

Thank you for all of the unit tests! ❤️

Comment on lines 68 to 69
r.classElementsMap[r.classCounter] = []string{element}
r.classCounter++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I take it this is used to enforce order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. As the equivalence relation goes, it's only important that any new element gets its own distinct class-number. So r.classCounter++ ensures the next element is added onto its own distinct class.

The users of the equivalence relation, though, are interested in the order, just so that we have kind of a stable sorting mechanism.

go/mathutil/equivalence_relation.go Outdated Show resolved Hide resolved
go/vt/schemadiff/schema.go Outdated Show resolved Hide resolved
go/vt/schemadiff/schema.go Outdated Show resolved Hide resolved
go/vt/schemadiff/schema_diff.go Outdated Show resolved Hide resolved
go/vt/schemadiff/schema_diff_test.go Outdated Show resolved Hide resolved
{
iteration := 0
allPerms := map[string]bool{}
allDiffs := schemaDiff.allDiffs()[:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why you use [:] here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious too! Removed.

go/vt/schemadiff/schema_diff_test.go Show resolved Hide resolved
}
for _, diff := range orderedDiffs {
s := diff.CanonicalStatementString()
// Internal sanity, while we're here: see that the equivalence relation has entries for all diffs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think as part of the inclusive language effort we're supposed to avoid sanity-check related language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to integrity

go/vt/schemadiff/table_test.go Show resolved Hide resolved
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@@ -119,7 +119,7 @@ var tounicode_dec8_swedish_ci = [...]uint16{
0x00f8, 0x00f9, 0x00fa, 0x00fb, 0x00fc, 0x00ff, 0x0000, 0x0000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shlomi-noach Any reason why the collations data was updated in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbussink yes! Per #12551 (review), this is just a gofmt issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only change is to fix the lint warnings, removing the redundant eightbit.UnicodeMapping{} type.

Copy link
Contributor Author

@shlomi-noach shlomi-noach Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbussink yes! Per #12551 (review), this is just a gofmt issue.

// SchemaDiff calulates a rich diff between this schema and the given schema. It is stronger than Diff().
// On top of returning the list of diffs that can take this schema into the given schema, this function also
// evaluates the dependencies between those diffs, if any.
func (s *Schema) SchemaDiff(other *Schema, hints *DiffHints) (*SchemaDiff, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to deprecate / remove Diff? I guess it uses it internally, but should we then rename Diff to diff to not export it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I intend to do that, but in a separate PR, to have a grace period of a couple weeks where we can use either.

@@ -231,6 +246,11 @@ func (d *RenameTableEntityDiff) IsEmpty() bool {
return d.Statement() == nil
}

// EntityName implements EntityDiff
func (d *RenameTableEntityDiff) EntityName() string {
return d.from.Name()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this deliberately the from name? Or should it be the to name? Or does it not matter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this isn't strictly in-use right now, it is deliberately from name. This is because the diff operates on the from table -- so that table is the one being affected by the operation. This is in line with the rest of diff types (with the exception of CREATE where the affected table is to).

Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants