-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow source columns to be specified multiple times in a vreplication rule filter #9335
Conversation
…the rule's filter. Add tests for this as well as to demonstrate that this fix also allows custom functions to be written Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
There was some discussion offline. |
Iiirc we had discussed this in Open Hour, but decided to go with the current implementation: moving it to vplayer/vcopier will require a bigger refactor. Currently the target expects that vstreamer is sending all the columns that the vplayer/vcopier need. Changing that will touch a few different parts of code. The current approach has minimal code impact and will fix the existing bug (albeit not optimally). We will address the optimization in a future PR. |
vc.AddKeyspace(t, []*Cell{defaultCell}, sourceKs, "-", smMaterializeVSchemaSource, smMaterializeSchemaSource, defaultReplicas, defaultRdonly, 300) | ||
vtgate = defaultCell.Vtgates[0] | ||
require.NotNil(t, vtgate) | ||
vtgate.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.primary", sourceKs, "0"), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They seem to be synonyms but is there any reason to use "-" above and "0" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a regression (a long while ago) where '-' and '0' were giving different results which wasn't caught by the tests at that time! So as a reflex I have used the two interchangeably at times, but it is not strictly necessary for this test: I changed it to use '0' everywhere.
require.NoError(t, err) | ||
|
||
ks2Keyspace := vc.Cells[defaultCell.Name].Keyspaces[targetKs] | ||
ks2Primary := ks2Keyspace.Shards["-"].Tablets["zone1-400"].Vttablet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this instead of relying on a hardcoded tablet UID:
ks2Primary := vc.getVttabletsInKeyspace(t, defaultCellName, targetKs, "primary")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getVttabletsInKeyspace
returns a map with tablet name as the key. In this case we get a map with a single entry since we only have one shard and hence only one primary, but in general we could have multiple entries for each shard. I added another helper getPrimaryTablet(keyspace, shard)
for this purpose.
@@ -464,7 +464,7 @@ func TestPlayerCopyTables(t *testing.T) { | |||
filter := &binlogdatapb.Filter{ | |||
Rules: []*binlogdatapb.Rule{{ | |||
Match: "dst1", | |||
Filter: "select * from src1", | |||
Filter: "select id, val, val as val2 from src1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention isn't entirely clear to me as there is also a column called val2
, right? Maybe we just say val, val as valdup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val2
is an extra column in the target. source does not have it. The filter requests val
twice, second time as val2
to test that the same column can be referred to multiple times. The sql generated on the target uses the field name as the column name so it needs to match the column name on the target.
… get primary instead of referring to it using a tablet id Signed-off-by: Rohit Nayak <rohit@planetscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It seems like all of Matt's feedback has been addressed.
@@ -61,7 +61,7 @@ const smMaterializeSpec = `{"workflow": "wf1", "source_keyspace": "ks1", "target | |||
const initDataQuery = `insert into ks1.tx(id, typ, val) values (1, 1, 'abc'), (2, 1, 'def'), (3, 2, 'def'), (4, 2, 'abc'), (5, 3, 'def'), (6, 3, 'abc')` | |||
|
|||
// TestShardedMaterialize tests a materialize from a sharded (single shard) using comparison filters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment should be changed to reflect function name change.
RETURN id * length(val); | ||
` | ||
|
||
// TestMaterialize: details mentioned above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment should be changed: Test -> test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or can possibly be deleted altogether.
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Description
This PR fixes a bug where a column cannot be referred to multiple times in a Filter. There was an incorrect optimization in vstreamer where a column was being ignored while creating the source query if it was already seen.
So a materialize
source_expression
likeselect id, val, ts, dayofmonth(ts) as day from mat
would fail sincets
was mentioned twice in the select list.In addition, a new test has been added that confirms that Materialize filters can contain user-defined functions: we can define a
source_expression
likeselect id, val, ts, customFunc1(id, val) as x from mat
. This is more an observation rather than a new feature, since, to our query parser, a custom function looks like just a pre-defined mysql function. It is being stated explicitly since we had earlier assumed we did not yet support custom functions because a user had earlier reported failures with such a filter. It now looks as if that report was caused by this same bug with multiple columns.Signed-off-by: Rohit Nayak rohit@planetscale.com
Related Issue(s)
#9314
Checklist