-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new UDFs for set-like operations on Arrays #5548
feat: new UDFs for set-like operations on Arrays #5548
Conversation
import java.util.List; | ||
import org.junit.Test; | ||
|
||
public class ArrayExceptTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing a test case where the arrays contain null values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh, good catch - thanks! will add
final List<String> input1 = Arrays.asList(null, null); | ||
final List<String> input2 = Arrays.asList(null, null, null); | ||
final List<String> result = udf.union(input1, input2); | ||
assertThat(result, contains(nullValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing behavior for me. Why does the test right above contain a null value in the resulting array but this test returns null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a subtlety here - in this case the udf doesn't really return null, instead it returns an array containing a single null - which i think is what the function should do in this case, on the grounds that the only distinct value found in either of the input arrays is a null. This is different than the case that we pass in an actual null, as compared to array-of-nulls which is what we have here, and that other case is tested below.
Does that make sense, or am i misunderstanding the question ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I just noticed the contains
. I thought it was equals
that's why I asked. Your answer and the code totally make sense. Sorry about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this new matcher-naming throws me off regularly too - i think i liked "contains" much better as something like "isArrayContaining", especially when you're scanning through a whole set of tests :)
``` | ||
|
||
Returns an array of all the distinct values, including NULL if present, from the input array. | ||
The output array elements will be in order of their first occurrence in the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output array elements will be in order of their first occurrence in the input. | |
The output array elements are in order of their first occurrence in the input. |
ARRAY_EXCEPT(array1, array2) | ||
``` | ||
|
||
Returns an array of all the distinct elements from an array except for those also present in a second array. The order of entries in the first array is preserved although any duplicates are removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns an array of all the distinct elements from an array except for those also present in a second array. The order of entries in the first array is preserved although any duplicates are removed. | |
Returns an array of all the distinct elements from an array, except for those also present in a second array. The order of entries in the first array is preserved, but duplicates are removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a few tiny suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Four new UDFs, implementing set-like operations on Array fields,inspired by the similarly-named functions in Presto (and others):
array_distinct(array)
array_except(array1, array2)
array_intersect(array1, array2)
array_union(array1, array2)
This is a revised and updated version of these functions and their tests, which were originally proposed long ago in PR #1611 and blocked at that time by limited scope for handling collections of generics in the UDF invocation framework.
Review feedback from the original PR incorporated herein, along with QTTs and historical plans.