Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove map cast for element access #22059

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

feilong-liu
Copy link
Contributor

Rewrite expression from cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint, and element_at(cast(feature as map<bigint, float>), key) -> element_at(feature, try_cast(key as integer)), where feature is of type map<integer, float> and key is of type bigint, so as to get rid of map cast function.

Description

Map cast is expensive, this PR tries to get rid of map cast when possible, it currently targets the following cases:

  • cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint
  • element_at(cast(feature as map<bigint, float>), key) -> element_at(feature, try_cast(key as integer)), where feature is of type map<integer, float> and key is of type bigint

Notice that here when it's accessing the map using subscript function, we use CAST function in index, and when it's element_at function, we use TRY_CAST function, so that

  • when the key is out of integer range, for feature[key] it will fail both with and without optimization, fail with map key not exists before optimization and with cast failure after optimization
  • when the key is out of integer range, for element_at(feature, key) it will return NULL both before and after optimization

Motivation and Context

Get rid of the expensive map cast expression.

Impact

Get rid of map cast for cost for these expressions.

Test Plan

Unit tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add an optimizer rule to get rid of map cast in map access functions when possible

@feilong-liu feilong-liu requested a review from a team as a code owner March 1, 2024 19:32
@feilong-liu feilong-liu requested a review from presto-oss March 1, 2024 19:32
REMOVE_MAP_CAST,
"Remove map cast when possible",
false,
false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to add this to features config to make this into a config property

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to add this to features config to make this into a config property

If this becomes a new feature property, please add documentation for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu can you add a configuration property for this as per @jaystarshot's suggestion? Session properties should always have a corresponding config property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu can you add a configuration property for this as per @jaystarshot's suggestion? Session properties should always have a corresponding config property.

We are trying to move away from that model for optimizations that we know are safe (famous last words lol) to reduce noise. So feature config is only for features and not for optimizations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a configuration doesn't seem especially noisy to me, but it does allow people to disable the feature for all queries if there ends up being a bug or regression in the optimizer (not everyone has infrastructure set up to easily tack session properties on to all user sessions if needed).

kaikalur
kaikalur previously approved these changes Mar 1, 2024
Type toKeyType = ((MapType) castExpression.getType()).getKeyType();
Type toValueType = ((MapType) castExpression.getType()).getValueType();

if (toValueType.equals(fromValueType) && fromKeyType.equals(INTEGER) && toKeyType.equals(BIGINT) && node.getArguments().get(1).getType().equals(BIGINT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it in a separate function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

assertQuery(enableOptimization, "select element_at(feature, key) from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 400000000000)) t(feature, key)",
"values 0.5, null");
assertQueryFails(enableOptimization, "select feature[key] from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 400000000000)) t(feature, key)",
".*Out of range for integer.*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try some unrelated ones where this rule should not trigger - like varchar, map<varchar,float> etc


/**
* Remove cast on map if possible. Currently it only supports subscript and element_at function, and only works when map key is of type integer and index is bigint. For example:
* cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you put the before and after on separate lines for readability? something like this:

For arguments feature: map<integer, float>, key: bigint
Input: cast(feature as map<bigint, float>)[key]

Output:
feature[cast(key as integer)]

/**
* Remove cast on map if possible. Currently it only supports subscript and element_at function, and only works when map key is of type integer and index is bigint. For example:
* cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)], where feature is of type map<integer, float> and key is of type bigint
* element_at(cast(feature as map<bigint, float>), key) -> element_at(feature, try_cast(key as integer)), where feature is of type map<integer, float> and key is of type bigint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the cast added in the first place? parser/analyzer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type promotion - ideally we should fix the type promo. For example:

element_at(map(int, real), bigint)

promotes map to map(bigint, real) instead we should coerce bigint to int? But I think our function resolution logic maybe inadequate for that.

So for now, we just decided to drop these unnecessary casts. This helps a lot in our current prod workloads as there is a lot of mixing of int and bigint in keys and map

Type toKeyType = ((MapType) castExpression.getType()).getKeyType();
Type toValueType = ((MapType) castExpression.getType()).getValueType();

if (toValueType.equals(fromValueType) && fromKeyType.equals(INTEGER) && toKeyType.equals(BIGINT) && node.getArguments().get(1).getType().equals(BIGINT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we special case the int/bigint pair? shouldn't this work for any cast that is not lossy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-exact types as keys is dubious (and also not common) so we don't want to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, if we extract this in a function as suggested above we can also extend this in the future to handle other pairs if needed

Rewrite expression from cast(feature as map<bigint, float>)[key] -> feature[cast(key as integer)],
where feature is of type map<integer, float> and key is of type bigint, and
element_at(cast(feature as map<bigint, float>), key) -> element_at(feature, try_cast(key as integer)),
where feature is of type map<integer, float> and key is of type bigint, so as to get rid of map cast function.
@feilong-liu feilong-liu force-pushed the skip_map_cast_int_bigint branch from cf0b50c to 91aa1b1 Compare March 6, 2024 20:49
@feilong-liu feilong-liu merged commit 4629ca5 into prestodb:master Mar 7, 2024
56 checks passed
@feilong-liu feilong-liu deleted the skip_map_cast_int_bigint branch March 7, 2024 23:49
@wanglinsong wanglinsong mentioned this pull request May 1, 2024
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants