Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemadiff: validate views' referenced columns via semantics #12565

Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c4c213b
validate table columns referenced by views
shlomi-noach Jan 24, 2023
3dac907
validate views' columns
shlomi-noach Jan 24, 2023
e88d426
support star expression
shlomi-noach Jan 25, 2023
61de6b0
removed 'TODO'
shlomi-noach Jan 25, 2023
389872c
rename: errs
shlomi-noach Jan 25, 2023
56c8421
use mutierr
shlomi-noach Jan 25, 2023
11d6379
excessive test, removed
shlomi-noach Jan 25, 2023
f4fd673
go mod tidy
shlomi-noach Jan 25, 2023
a4ae98d
update to latest mutierr
shlomi-noach Jan 25, 2023
ef16bdf
merge main, resolve conflict
shlomi-noach Jan 26, 2023
54fb73a
avoid setting entity columns in case of error
shlomi-noach Jan 26, 2023
2d3e70b
grammar
shlomi-noach Jan 30, 2023
b69ede6
schemadiff: fix scenario where no tables exist in schema and with jus…
shlomi-noach Jan 30, 2023
bc34369
dual, not dual2
shlomi-noach Jan 30, 2023
bc8799d
stripped irrelevant comments
shlomi-noach Jan 30, 2023
dd1bc82
merge main, resolve conflict
shlomi-noach Feb 5, 2023
7f6fd13
using AllErrorRecorder instead of multierr package
shlomi-noach Feb 6, 2023
4939d01
multierr update
shlomi-noach Feb 20, 2023
89d6cba
using FakeSI as table-column model
shlomi-noach Mar 7, 2023
2194291
rename variable
shlomi-noach Mar 7, 2023
38eab14
clone
shlomi-noach Mar 7, 2023
7eb83c1
add InvalidStarExprInViewError
shlomi-noach Mar 7, 2023
f98729b
Make 'args()' accessible
shlomi-noach Mar 7, 2023
36075ec
using 'go/vt/vtgate/semantics' to analyze view queries. Handling star…
shlomi-noach Mar 7, 2023
26e9577
more test cases
shlomi-noach Mar 7, 2023
5718b58
typo
shlomi-noach Mar 8, 2023
fe90b31
unexpected error
shlomi-noach Mar 8, 2023
db43534
use ColumnName()
shlomi-noach Mar 8, 2023
3bf836f
iterate columns instead of Walk
shlomi-noach Mar 8, 2023
95a963e
Merge branch 'main' into schemadiff-validate-view-columns-semantics
shlomi-noach Mar 9, 2023
ebf8a12
Merge branch 'main' into schemadiff-validate-view-columns-semantics
shlomi-noach Mar 9, 2023
1244c0b
simplify InvalidColumnReferencedInViewError
shlomi-noach Mar 20, 2023
8b07d21
do not use FakeSI, create my own implementation of semantics.SchemaIn…
shlomi-noach Mar 20, 2023
017e10d
Refactor: go/vt/vtgate/engine/opcode to reduce semantics package depe…
shlomi-noach Mar 20, 2023
eac2860
add new package
shlomi-noach Mar 20, 2023
4daa08b
copyright
shlomi-noach Mar 20, 2023
1699955
fix function comment
shlomi-noach Mar 20, 2023
c66b6e9
Merge branch 'resolve-semantics-engine-deps' into schemadiff-validate…
shlomi-noach Mar 20, 2023
1c54579
empty commit to kick CI
shlomi-noach Mar 20, 2023
dfd1cc5
Merge branch 'main' into schemadiff-validate-view-columns-semantics
shlomi-noach Mar 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions go/vt/schemadiff/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -334,3 +334,39 @@ type ViewDependencyUnresolvedError struct {
func (e *ViewDependencyUnresolvedError) Error() string {
return fmt.Sprintf("view %s has unresolved/loop dependencies", sqlescape.EscapeID(e.View))
}

type InvalidColumnReferencedInViewError struct {
View string
Table string
Column string
Ambiguous bool
}

func (e *InvalidColumnReferencedInViewError) Error() string {
switch {
case e.Column == "":
return fmt.Sprintf("view %s references non-existent table %s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Table))
case e.Table != "":
return fmt.Sprintf("view %s references non-existent column %s.%s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Table), sqlescape.EscapeID(e.Column))
case e.Ambiguous:
return fmt.Sprintf("view %s references unqualified but non unique column %s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Column))
default:
return fmt.Sprintf("view %s references unqualified but non-existent column %s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Column))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error message looks very fragile based on what struct fields are set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also create the message ahead of time, when the struct is created?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s actually the wrong pattern. The core is the structured error and the string format is just a detail here.

The structured error removes the issue of string being brittle because anyone wants more details doesn’t look at the string at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also create the message ahead of time, when the struct is created?

Id' like to avoid that right now in the context of this PR, because that would require us to rewrite all error handling.

Also, I'm really not sure, because we want to be able to create errors using various combination of fields. We don't have a single "constructor" ; we'd need different New*() function for any combination of fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error message looks very fragile based on what struct fields are set.

I can rewrite that to be more clear

Copy link
Contributor Author

@shlomi-noach shlomi-noach Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to simplify InvalidColumnReferencedInViewError by removing unused Table field and subsequent error message generation cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshit-gangal allocating the string is also wasteful since when using structured errors it is never used. So I don’t think we should do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the MySQL protocol level, error strings are transferred over the wire, so we might be generating them eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshit-gangal this is never used for MySQL level errors. But even then, we’d allocate when needed and not always so then it’s imho still an improvement.

It also improved the ergonomics imho since it’s now easier to create a structured error then.

}
}

type InvalidStarExprInViewError struct {
View string
}

func (e *InvalidStarExprInViewError) Error() string {
return fmt.Sprintf("view %s has invalid star expression", sqlescape.EscapeID(e.View))
}

type EntityNotFoundError struct {
Name string
}

func (e *EntityNotFoundError) Error() string {
return fmt.Sprintf("entity %s not found", sqlescape.EscapeID(e.Name))
}
153 changes: 153 additions & 0 deletions go/vt/schemadiff/schema.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@ import (
"sort"
"strings"

"vitess.io/vitess/go/vt/concurrency"
"vitess.io/vitess/go/vt/proto/vtrpc"
"vitess.io/vitess/go/vt/sqlparser"
"vitess.io/vitess/go/vt/vterrors"
"vitess.io/vitess/go/vt/vtgate/semantics"
)

// Schema represents a database schema, which may contain entities such as tables and views.
Expand Down Expand Up @@ -318,6 +322,11 @@ func (s *Schema) normalize() error {
}
}

// Validate views' referenced columns: do these columns actually exist in referenced tables/views?
if err := s.ValidateViewReferences(); err != nil {
return err
}

// Validate table definitions
for _, t := range s.tables {
if err := t.validate(); err != nil {
Expand Down Expand Up @@ -761,3 +770,147 @@ func (s *Schema) Apply(diffs []EntityDiff) (*Schema, error) {
}
return dup, nil
}

func (s *Schema) ValidateViewReferences() error {
errs := &concurrency.AllErrorRecorder{}
schemaInformation := newDeclarativeSchemaInformation()

// Remember that s.Entities() is already ordered by dependency. ie. tables first, then views
// that only depend on those tables (or on dual), then 2nd tier views, etc.
// Thus, the order of iteration below is valid and sufficient, to build
for _, e := range s.Entities() {
entityColumns, err := s.getEntityColumnNames(e.Name(), schemaInformation)
if err != nil {
errs.RecordError(err)
continue
}
schemaInformation.addTable(e.Name())
for _, col := range entityColumns {
schemaInformation.addColumn(e.Name(), col.Lowered())
}
}

// Add dual table with no explicit columns for dual style expressions in views.
schemaInformation.addTable("dual")

for _, view := range s.Views() {
sel := sqlparser.CloneSelectStatement(view.CreateView.Select) // Analyze(), below, rewrites the select; we don't want to actually modify the schema
_, err := semantics.Analyze(sel, semanticKS.Name, schemaInformation)
extractColName := func(arg any) string {
switch arg := arg.(type) {
case string:
return arg
case *sqlparser.ColName:
return arg.Name.String()
default:
return "" // unexpected
}
}
formalizeErr := func(err error) error {
if err == nil {
return nil
}
semErr, ok := err.(*semantics.Error)
if !ok {
return err
}
if len(semErr.Args()) == 0 {
return err
}
switch semErr.Code {
case semantics.AmbiguousColumn:
if colName := extractColName(semErr.Args()[0]); colName != "" {
return &InvalidColumnReferencedInViewError{
View: view.Name(),
Column: colName,
Ambiguous: true,
}
}
case semantics.ColumnNotFound:
if colName := extractColName(semErr.Args()[0]); colName != "" {
return &InvalidColumnReferencedInViewError{
View: view.Name(),
Column: colName,
}
}
}
return err
}
errs.RecordError(formalizeErr(err))
}
return errs.AggrError(vterrors.Aggregate)
}

// getTableColumnNames returns the names of columns in given table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// getTableColumnNames returns the names of columns in given table.
// getEntityColumnNames returns the names of columns in given table.

func (s *Schema) getEntityColumnNames(entityName string, schemaInformation *declarativeSchemaInformation) (
columnNames []*sqlparser.IdentifierCI,
err error,
) {
entity := s.Entity(entityName)
if entity == nil {
if strings.ToLower(entityName) == "dual" {
// this is fine. DUAL does not exist but is allowed
return nil, nil
}
return nil, &EntityNotFoundError{Name: entityName}
}
// The entity is either a table or a view
switch entity := entity.(type) {
case *CreateTableEntity:
return s.getTableColumnNames(entity), nil
case *CreateViewEntity:
return s.getViewColumnNames(entity, schemaInformation)
}
return nil, vterrors.Errorf(vtrpc.Code_INTERNAL, "unexpected entity type for %v", entityName)
}

// getTableColumnNames returns the names of columns in given table.
func (s *Schema) getTableColumnNames(t *CreateTableEntity) (columnNames []*sqlparser.IdentifierCI) {
for _, c := range t.TableSpec.Columns {
columnNames = append(columnNames, &c.Name)
}
return columnNames
}

// getViewColumnNames returns the names of aliased columns returned by a given view.
func (s *Schema) getViewColumnNames(v *CreateViewEntity, schemaInformation *declarativeSchemaInformation) (
columnNames []*sqlparser.IdentifierCI,
err error,
) {
for _, node := range v.Select.GetColumns() {
switch node := node.(type) {
case *sqlparser.StarExpr:
if tableName := node.TableName.Name.String(); tableName != "" {
for _, col := range schemaInformation.Tables[tableName].Columns {
name := sqlparser.CloneRefOfIdentifierCI(&col.Name)
columnNames = append(columnNames, name)
}
} else {
dependentNames, err := getViewDependentTableNames(v.CreateView)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - information about used tables can be had from the semantic table that is the output of semantics.Analyze.

It's in the Tables []TableInfo on SemTable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but, this is chicken and egg: I'm running the above in order to build the FakeSI to feed semantics...

if err != nil {
return nil, err
}
// add all columns from all referenced tables and views
for _, entityName := range dependentNames {
if schemaInformation.Tables[entityName] != nil { // is nil for dual/DUAL
for _, col := range schemaInformation.Tables[entityName].Columns {
name := sqlparser.CloneRefOfIdentifierCI(&col.Name)
columnNames = append(columnNames, name)
}
}
}
}
if len(columnNames) == 0 {
return nil, &InvalidStarExprInViewError{View: v.Name()}
}
case *sqlparser.AliasedExpr:
ci := sqlparser.NewIdentifierCI(node.ColumnName())
columnNames = append(columnNames, &ci)
}
}

if err != nil {
return nil, err
}
return columnNames, nil
}
Loading