Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemadiff: validate views' referenced columns #12147

Conversation

shlomi-noach
Copy link
Contributor

Description

Up till now, schemadiff views validation and analysis was limited to:

  • Obviously validate statement
  • Infer tables (or views) reverenced by a view
  • Validate those tables/views exist
  • Validate no dependency loop

As of this PR, schemadiff also validates columns used and returned by views:

  • Validate that columns in SELECT and in WHERE clause do in fact exist in tables/views referenced by the view. Error when an unknown column is encountered.
  • Allow unqualified table name where there is no ambiguity. Error on unqualified name with ambiguity.
  • Analyze view's column names, whether aliased or not.
  • Support cascaded views.
  • Support star expression, either qualified (create view v as select t.* from t, t2 ...) or unqualified (create view v as select * from t1,t2) ; infer list of columns ; support cascaded views based on star expression views.

Plenty tests added to validate behavior.

Joint with @dbussink and based on his preliminary work.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

In a followup PR: optimize such that we only analyze columns tables/views that are referenced by some view in the first place; there is no need to analyze e.g. a table that is never read by a view. This will allow us to save some memory & cpu.

@@ -667,6 +667,7 @@ func TestDiffSchemas(t *testing.T) {
// validate schema1 unaffected by Apply
assert.Equal(t, schema1SQL, schema1.ToSQL())

require.NotNil(t, schema2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check useful? We already check earlier that there's no error returned when schema2 is created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -750,3 +759,258 @@ func (s *Schema) Apply(diffs []EntityDiff) (*Schema, error) {
}
return dup, nil
}

// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this TODO reference to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, it references the fact that is is already done... removing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// TODO

func (s *Schema) ValidateViewReferences() error {
var allerrors error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually use errs as a the name for this kinda thing to collect errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

for _, e := range s.Entities() {
entityColumns, err := s.getEntityColumnNames(e.Name(), availableColumns)
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this add to the error list instead? And then we continue through the rest of the entities? If this is a totally unexpected error, maybe good to add a comment about that and why it's safe to return immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

go/vt/schemadiff/schema.go Show resolved Hide resolved
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
go.mod Outdated
@@ -113,6 +113,7 @@ require (
github.com/kr/text v0.2.0
github.com/nsf/jsondiff v0.0.0-20210926074059-1e845ec5d249
github.com/openark/golib v0.0.0-20210531070646-355f37940af8
go.uber.org/multierr v1.8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also update to 1.9.0 here since that's the latest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels Jan 25, 2023
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
}
return true, nil
}, view.Select)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - sqlparser.Walk will only return the error you return from inside the visitor function, and since you don't return any errors from that, no errors will ever make it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@systay ah, great! Thank you. I'll keep the check as it is, for safety, but good to know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@systay Right, but the errors are gathered here in errs? And those are at the end returned if the walker itself doesn't error (which is not expected).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbussink yeah, that was my point. Not really necessary to catch the returned error from sqlparser.Walk since that is not how we are dealing with the errors. When I know it will never return an error, I usually write.

	_ = sqlparser.Walk(...)

OTOH - it's probably good defensive programming to do as @shlomi-noach is doing here and catching and checking the error anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@systay Right, guess as someone now knowing the details too much about how Walk is implemented, I wouldn't know that it can't return an error and would still write it defensively 😄.

@shlomi-noach
Copy link
Contributor Author

Looking for a 2nd review/approval

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can/should avoid importing the external multierr package but it otherwise LGTM.

Not adding a formal approval as we should, IMO, get @systay or @harshit-gangal to review as they are in the query serving team and working on the overall Vitess View support work. This way we can ensure that we're moving forward together and all the pieces will fit together nicely.

@systay or @harshit-gangal , could you please review? Please note if you cannot and I can add a formal approval here if needed. Thanks!

for _, e := range s.Entities() {
entityColumns, err := s.getEntityColumnNames(e.Name(), availableColumns)
if err != nil {
errs = multierr.Append(errs, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the Vitess concurrency package here along with vterrors.Aggregate? You can see that used throughout the code base if you look for .AggrError(vterrors.Aggregate). Between the concurrency and vterrors packages we should have whatever related functionality you need here for error handling. For example:

func (ts *trafficSwitcher) ForAllSources(f func(source *workflow.MigrationSource) error) error {
var wg sync.WaitGroup
allErrors := &concurrency.AllErrorRecorder{}
for _, source := range ts.sources {
wg.Add(1)
go func(source *workflow.MigrationSource) {
defer wg.Done()
if err := f(source); err != nil {
allErrors.RecordError(err)
}
}(source)
}
wg.Wait()
return allErrors.AggrError(vterrors.Aggregate)
}

Copy link
Contributor

@dbussink dbussink Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it does use locking which is unnecessary here. Dunno if we care about the overhead of that here? That logic seems designed for concurrent error gathering which isn't what we're doing here.

Using multierr seems simpler here and it's already an indirect dependency? Not a really strong opinion though, we can also use this but it seems a bit off for what it was designed for.

I think this logic here is temporary anyway, since once golang/go#53435 is available with Go 1.20 we probably want to switch to that anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, race condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How strongly do people feel about this? Looking at the two implementation multierr does seem to be more fitting to our purpose, but I don't feel strongly.

Copy link
Contributor

@mattlord mattlord Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only need multiple errors you can use vterrors.Aggregate(), no need for the concurrency piece (as noted, geared toward N goroutines) like we do here e.g.:

var terrs []error
for !empty {
select {
case result := <-resultCh:
switch result.state {
case vcopierCopyTaskCancel:
// A task cancelation probably indicates an expired context due
// to a PlannedReparentShard or elapsed copy phase duration,
// neither of which are error conditions.
case vcopierCopyTaskComplete:
// Get the latest lastpk, purely for logging purposes.
lastpk = result.args.lastpk
case vcopierCopyTaskFail:
// Aggregate non-nil errors.
terrs = append(terrs, result.err)
}
default:
empty = true
}
}
if len(terrs) > 0 {
terr := vterrors.Aggregate(terrs)
log.Warningf("task error in workflow %s: %v", vc.vr.WorkflowName, terr)
return fmt.Errorf("task error: %v", terr)
}

I don't feel strongly about it though. Up to you.

go/vt/schemadiff/errors.go Outdated Show resolved Hide resolved
go/vt/schemadiff/errors.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…t views reading from DUAL

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Incorporated #12189 into this PR, as it deals with view analysis

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

I've switched away from multierr and into AllErrorsRecorder. I want to say multierr seems to have the better design over AllErrorsRecorder, and even over the new Go 1.20 error wrapping support.

@shlomi-noach
Copy link
Contributor Author

Looking for another review/approval 🙏

@shlomi-noach
Copy link
Contributor Author

ping @vitessio/query-serving for review

@shlomi-noach shlomi-noach requested a review from a team February 8, 2023 06:22
@shlomi-noach
Copy link
Contributor Author

Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else LGTM!

go.mod Outdated
@@ -187,7 +187,7 @@ require (
github.com/tidwall/pretty v1.2.0 // indirect
go.opencensus.io v0.24.0 // indirect
go.uber.org/atomic v1.10.0 // indirect
go.uber.org/multierr v1.9.0 // indirect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for this downgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was upgraded in a previous PR, that was eventually not using multierr. I just ran go mod tidy and it auto-downgraded, so... I don't know?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shlomi-noach might be because the direct dependency was dropped? A go get -u go.uber.org/multierr probably would add this back correctly (it should be there, so good catch @GuptaManan100).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Stalling this PR a bit as I'm to look into the go/vt/vtgate/semantics implementation of view dependency management.

shlomi-noach added a commit to planetscale/vitess that referenced this pull request Mar 5, 2023
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach marked this pull request as draft March 6, 2023 07:25
@shlomi-noach
Copy link
Contributor Author

Replaced by #12565 ; will close shortly.

@shlomi-noach shlomi-noach deleted the schemadiff-validate-view-columns-deps branch March 21, 2023 03:57
shlomi-noach added a commit that referenced this pull request Apr 13, 2023
…12551)

* WIP: schemadiff, analyzing diff dependencies

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* ongoing work; many tests added

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* partitioning tests

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* function refactor

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* minor reordering of variables

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* copyright year

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* stages to be commited

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* math: permutations

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* mathutil

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* sort table alter options on diff

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* alter reordering: first drop foreign key, then drop key

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* remove tags logic from EquivalenceRelation

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* removed go/mathutil/permutations.go

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* refactor/rename

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* cleanup unused code

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* OrderedDiffs

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* added an impossible-order test

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* clarify comment

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* AllSequentialExecutionDeps()

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* test: expected order of entities in ordered diffs

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* function comments

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* remove unused code

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* code comments; privatize functions; remove unused code

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* code comments. Privatize functoin

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* test deferred till #12147 is merged

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* code comments

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* fix permutation logic, add tests

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* unindent using early returns

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* code comment

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* reposition some functions in file (no actualy changes)

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* relocated the functions back because the diff became really confusing

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* grammar

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* code comments, minor optimization

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* typo

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* Added tests, minor edits per review. changed language

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* handling empty diff slice; improved comments

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* Clarify test reasoning

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* grammar

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

---------

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants