-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove map cast for element access #22059
Remove map cast for element access #22059
Conversation
REMOVE_MAP_CAST, | ||
"Remove map cast when possible", | ||
false, | ||
false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to add this to features config to make this into a config property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to add this to features config to make this into a config property
If this becomes a new feature property, please add documentation for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feilong-liu can you add a configuration property for this as per @jaystarshot's suggestion? Session properties should always have a corresponding config property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feilong-liu can you add a configuration property for this as per @jaystarshot's suggestion? Session properties should always have a corresponding config property.
We are trying to move away from that model for optimizations that we know are safe (famous last words lol) to reduce noise. So feature config is only for features and not for optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a configuration doesn't seem especially noisy to me, but it does allow people to disable the feature for all queries if there ends up being a bug or regression in the optimizer (not everyone has infrastructure set up to easily tack session properties on to all user sessions if needed).
Type toKeyType = ((MapType) castExpression.getType()).getKeyType(); | ||
Type toValueType = ((MapType) castExpression.getType()).getValueType(); | ||
|
||
if (toValueType.equals(fromValueType) && fromKeyType.equals(INTEGER) && toKeyType.equals(BIGINT) && node.getArguments().get(1).getType().equals(BIGINT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it in a separate function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
assertQuery(enableOptimization, "select element_at(feature, key) from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 400000000000)) t(feature, key)", | ||
"values 0.5, null"); | ||
assertQueryFails(enableOptimization, "select feature[key] from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 400000000000)) t(feature, key)", | ||
".*Out of range for integer.*"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try some unrelated ones where this rule should not trigger - like varchar, map<varchar,float> etc
|
||
/** | ||
* Remove cast on map if possible. Currently it only supports subscript and element_at function, and only works when map key is of type integer and index is bigint. For example: | ||
* cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you put the before and after on separate lines for readability? something like this:
For arguments feature: map<integer, float>, key: bigint
Input: cast(feature as map<bigint, float>)[key]
Output:
feature[cast(key as integer)]
/** | ||
* Remove cast on map if possible. Currently it only supports subscript and element_at function, and only works when map key is of type integer and index is bigint. For example: | ||
* cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint | ||
* element_at(cast(feature as map<bigint, float>), key) -> element_at(feature, try_cast(key as integer)), where feature is of type map<integer, float> and key is of type bigint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the cast added in the first place? parser/analyzer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type promotion - ideally we should fix the type promo. For example:
element_at(map(int, real), bigint)
promotes map to map(bigint, real) instead we should coerce bigint to int? But I think our function resolution logic maybe inadequate for that.
So for now, we just decided to drop these unnecessary casts. This helps a lot in our current prod workloads as there is a lot of mixing of int and bigint in keys and map
Type toKeyType = ((MapType) castExpression.getType()).getKeyType(); | ||
Type toValueType = ((MapType) castExpression.getType()).getValueType(); | ||
|
||
if (toValueType.equals(fromValueType) && fromKeyType.equals(INTEGER) && toKeyType.equals(BIGINT) && node.getArguments().get(1).getType().equals(BIGINT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we special case the int/bigint pair? shouldn't this work for any cast that is not lossy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-exact types as keys is dubious (and also not common) so we don't want to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, if we extract this in a function as suggested above we can also extend this in the future to handle other pairs if needed
0983047
to
cf0b50c
Compare
Rewrite expression from cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint, and element_at(cast(feature as map<bigint, float>), key) -> element_at(feature, try_cast(key as integer)), where feature is of type map<integer, float> and key is of type bigint, so as to get rid of map cast function.
cf0b50c
to
91aa1b1
Compare
Rewrite expression from cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint, and element_at(cast(feature as map<bigint, float>), key) -> element_at(feature, try_cast(key as integer)), where feature is of type map<integer, float> and key is of type bigint, so as to get rid of map cast function.
Description
Map cast is expensive, this PR tries to get rid of map cast when possible, it currently targets the following cases:
Notice that here when it's accessing the map using subscript function, we use CAST function in index, and when it's element_at function, we use TRY_CAST function, so that
Motivation and Context
Get rid of the expensive map cast expression.
Impact
Get rid of map cast for cost for these expressions.
Test Plan
Unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.