Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing schemadiff, a declarative diff for table/view CREATE statements #9719

Merged
merged 13 commits into from
Feb 24, 2022

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Feb 17, 2022

Description

go/vt/schemadiff is a new library whose functionality is to diff two CREATE TABLE statements, or two CREATE VIEW statements, and produce the SQL diff between them, which could be either:

  • CREATE TABLE / DROP TABLE / ALTER TABLE
  • CREATE VIEW / DROP VIEW / ALTER VIEW

The most interesting use case is getting a valid ALTER TABLE statement that will convert a table from one state to the other.

schemadiff relies on Vitess's go/vt/sqlparser functionality, and specifically:

  • on the parser's ability to parse queries
  • on the parser's ability to reconstruct queries

The main entry points for this functionality are:

func DiffCreateTablesQueries(query1 string, query2 string, hints *DiffHints) (EntityDiff, error)
func DiffTables(create1 *sqlparser.CreateTable, create2 *sqlparser.CreateTable, hints *DiffHints) (EntityDiff, error)
func DiffCreateViewsQueries(query1 string, query2 string, hints *DiffHints) (EntityDiff, error)
func DiffViews(create1 *sqlparser.CreateView, create2 *sqlparser.CreateView, hints *DiffHints) (EntityDiff, error)

As example:

query1 := "create table t (id int, primary key(id))"
query2 := "create table t (id int, c char(5), primary key(id))"
diff, err := schemadiff.DiffCreateTablesQueries(query1, query2, &schemadiff.DiffHints{})

The returned diff object can be nil, if no diff is found (ie two identical tables, or two identical views, or two nil statements). Otherwise, the diff can be queries for Statement() sqlparser.Statement

In the above example, we can expect sqlparser.String(diff.Statement()) == "alter table t add column c char(5)"

It is imperative that the two input queries, or the two Create(Table|View) statements are canonical. Ideally, the input should be a strict SHOW CREATE TABLE output (thus, made canonical by MySQL). Otherwise there are some scenarios where the parser can return a diff for semantically-identical statements,

Table diffing

Table diffing is the most complex of all diffs. schemadiff supports:

  • diffing columns & their types
  • diffing ordering of columns
  • diffing keys: at this time schemadiff is impartial to ordering of keys and does not generate a diff when keys are ordered differently between tables. there's an unimplemented hint flag to control that in the future.
  • diffing foreign keys
  • diffing partitions (up to current sqlparser's limitations -- right now subpartitioning unsupported)
  • diffing table options

schemadiff rejects statements that sqlparser cannot fully parse.

Testing

There are two forms of testing to schemadiff:

  • unit tests, found in go/vt/schemadiff. Those are self explanatory I believe.
  • endtoend testing. What we did was to piggyride the onlineddl/vrepl_suite tests: vrepl_suite is the main suite that validated vitess schema migration functionality, by presenting trivial, complex or niche ALTER cases with various data types, conversions and constraints. It is what builds our trust in vitess migrations.

schemadiff uses the same suite as follows:

  • it iterates all cases. For each case, it creates the original table, then applies (directly, no need for vitess Online DDL here) the alter statement in the test
  • it records the before/after table format
  • it evaluates the diff
  • it re-creates the "before" table
  • it applies its own diff
  • compares the result with recorded "after" format, expects exact same SHOW CREATE TABLE in MySQL -- this is the main validation
  • as an extra test, diffs its own generated "after" schema with the recorded "after" schema, expects an empty diff

I was able to moreover do a N*N comparison of all tables in the suite; this leads to some 5000 tests, they're all good. This part is committed but commented out. It takes 7 minutes to run, which is fine, but I'm afraid it will grow fast and too much, should we add more tests.

Using

At this time, the library is added but not used by any code path. In a followup PR, we will make Online DDL's -declarative migrations to use this library (technically this is already known to work, I just want to have separate PRs).

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Many thanks to @GuptaManan100 for fleshing out parsing issues found while working on this PR.

…eclarative schemas using go/vt/sqlparser and without need for a backend database

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving release notes none labels Feb 17, 2022
@shlomi-noach shlomi-noach requested a review from a team February 17, 2022 09:20
@shlomi-noach
Copy link
Contributor Author

Ahhh, there we go. I was testing with mysql80, but here we see a mysql57 error. Looking into!

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Woot, solved the failing test scenario.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Good to review.

@@ -0,0 +1,427 @@
/*
Copyright 2021 The Vitess Authors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2022

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

})
}

// func TestRandomSchemaChanges(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you intentionally leave this around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes. So I had this idea that on top of the existing 80 - 90 tests, I could also do a Cartesian product of all tests, a crossover between them all, to test any-schema-to-any-schema conversion. That's the commented code. The reason it's commented is that this leads to some 5K tests, which run well within 7 minutes -- a reasonable time -- but I'm afraid that if I add a bit more test, this will explode (runtime is n^2).

…face function that was never used

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…lumnDefinition field

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Did a bit more refactoring/cleanup -- nothing substantial to affect the review. Good to go.

@shlomi-noach shlomi-noach requested a review from a team February 23, 2022 04:41
Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. It's very little code for a lot of functionality. I think we still have pending the abstraction of this package, and the sqlparser, so it can be reused more easily, but that shouldn't block this PR.

@shlomi-noach
Copy link
Contributor Author

I think we still have pending the abstraction of this package, and the sqlparser, so it can be reused more easily, but that shouldn't block this PR.

👍 ❤️

@shlomi-noach
Copy link
Contributor Author

Tracking issue: #10203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants