-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(join): validate group key #4958
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Figuring out if we can elide the group key validation sounds like a tricky problem. I guess we have the same issue with map
though, I forget what we do there? I think we build the table and then slice it up into separate groups if needed. That's different than the situation here, where you need to produce an error.
If we ever want to vectorize the output function, we might considering deferring the check until after we've built an output chunk, and erroring slightly later.
Just thinking out loud here. Maybe the fix in this case would be to check the semantic graph and see if any group key columns are assigned a value other than |
The trouble with that approach, in my opinion, is that it is hard to know what's allowed via static analysis, since you could have a function like: emptyStringAppender = (str) => str + ""
(left, right) => {
k0 = left.k0
k0_ = emptyStringAppender(str: k0)
return {
k0: k0_, // the only group key column
// other fields here...
}
} A naive check would probably produce an error early even though this is actually fine. This is obviously a contrived example, but I'm not sure we can analyze a function such that we can be 100% sure the group key will change or not. If we get it wrong, the user gets an error for something they reasonably expect could succeed. I wonder if it would make sense just to produce the result with the new group key, the way that |
I considered it, but thought that might be surprising for users to see output records that don't match the record they defined in their |
Oh, I may have misunderstood your suggestion. Do you mean to regroup the results based on what the user sets as their new group key? I think the intention with the new join was to not allow users to modify the group key, just as |
Yes, that is exactly what I mean. I know that it is a radical suggestion. This is exactly what I don't think we should do anything for now, what you have here is totally reasonable for a new function. I'm not sure if we went back and created a |
Gotcha. I'll merge and maybe we can revisit this line of reasoning later. |
This is a more complete fix for #4909 which was previously closed by #4933.
The previous fix only returned an error if a group key column was missing from the joined output record. This fix also makes sure that the value does not change.