Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow source columns to be specified multiple times in a vreplication rule filter #9335

Merged
merged 4 commits into from
Jan 5, 2022

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Dec 7, 2021

Description

This PR fixes a bug where a column cannot be referred to multiple times in a Filter. There was an incorrect optimization in vstreamer where a column was being ignored while creating the source query if it was already seen.

So a materialize source_expression like select id, val, ts, dayofmonth(ts) as day from mat would fail since ts was mentioned twice in the select list.

In addition, a new test has been added that confirms that Materialize filters can contain user-defined functions: we can define a source_expression like select id, val, ts, customFunc1(id, val) as x from mat. This is more an observation rather than a new feature, since, to our query parser, a custom function looks like just a pre-defined mysql function. It is being stated explicitly since we had earlier assumed we did not yet support custom functions because a user had earlier reported failures with such a filter. It now looks as if that report was caused by this same bug with multiple columns.

Signed-off-by: Rohit Nayak rohit@planetscale.com

Related Issue(s)

#9314

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

…the rule's filter. Add tests for this as well as to demonstrate that this fix also allows custom functions to be written

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps changed the title WIP: Allow source columns to be specified multiple times in a filter Allow source columns to be specified multiple times in a vreplication rule filter Dec 9, 2021
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review December 9, 2021 09:18
@deepthi deepthi marked this pull request as draft December 16, 2021 02:53
@deepthi
Copy link
Member

deepthi commented Dec 16, 2021

There was some discussion offline. tablePlanBuilder is probably not the right place for this, we should be handling it on the other end (vcopier/vplayer).

@rohit-nayak-ps
Copy link
Contributor Author

There was some discussion offline. tablePlanBuilder is probably not the right place for this, we should be handling it on the other end (vcopier/vplayer).

Iiirc we had discussed this in Open Hour, but decided to go with the current implementation: moving it to vplayer/vcopier will require a bigger refactor. Currently the target expects that vstreamer is sending all the columns that the vplayer/vcopier need. Changing that will touch a few different parts of code. The current approach has minimal code impact and will fix the existing bug (albeit not optimally). We will address the optimization in a future PR.

vc.AddKeyspace(t, []*Cell{defaultCell}, sourceKs, "-", smMaterializeVSchemaSource, smMaterializeSchemaSource, defaultReplicas, defaultRdonly, 300)
vtgate = defaultCell.Vtgates[0]
require.NotNil(t, vtgate)
vtgate.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.primary", sourceKs, "0"), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They seem to be synonyms but is there any reason to use "-" above and "0" here?

Copy link
Contributor Author

@rohit-nayak-ps rohit-nayak-ps Dec 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a regression (a long while ago) where '-' and '0' were giving different results which wasn't caught by the tests at that time! So as a reflex I have used the two interchangeably at times, but it is not strictly necessary for this test: I changed it to use '0' everywhere.

require.NoError(t, err)

ks2Keyspace := vc.Cells[defaultCell.Name].Keyspaces[targetKs]
ks2Primary := ks2Keyspace.Shards["-"].Tablets["zone1-400"].Vttablet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this instead of relying on a hardcoded tablet UID:

ks2Primary := vc.getVttabletsInKeyspace(t, defaultCellName, targetKs, "primary")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getVttabletsInKeyspace returns a map with tablet name as the key. In this case we get a map with a single entry since we only have one shard and hence only one primary, but in general we could have multiple entries for each shard. I added another helper getPrimaryTablet(keyspace, shard) for this purpose.

@@ -464,7 +464,7 @@ func TestPlayerCopyTables(t *testing.T) {
filter := &binlogdatapb.Filter{
Rules: []*binlogdatapb.Rule{{
Match: "dst1",
Filter: "select * from src1",
Filter: "select id, val, val as val2 from src1",
Copy link
Contributor

@mattlord mattlord Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention isn't entirely clear to me as there is also a column called val2, right? Maybe we just say val, val as valdup ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val2 is an extra column in the target. source does not have it. The filter requests val twice, second time as val2 to test that the same column can be referred to multiple times. The sql generated on the target uses the field name as the column name so it needs to match the column name on the target.

… get primary instead of referring to it using a tablet id

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review December 23, 2021 10:34
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
It seems like all of Matt's feedback has been addressed.

@@ -61,7 +61,7 @@ const smMaterializeSpec = `{"workflow": "wf1", "source_keyspace": "ks1", "target
const initDataQuery = `insert into ks1.tx(id, typ, val) values (1, 1, 'abc'), (2, 1, 'def'), (3, 2, 'def'), (4, 2, 'abc'), (5, 3, 'def'), (6, 3, 'abc')`

// TestShardedMaterialize tests a materialize from a sharded (single shard) using comparison filters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment should be changed to reflect function name change.

RETURN id * length(val);
`

// TestMaterialize: details mentioned above
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment should be changed: Test -> test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or can possibly be deleted altogether.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants