Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transform_keys and transform_values Presto functions #2245

Closed

Conversation

mbasmanova
Copy link
Contributor

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 10, 2022
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Aug 10, 2022
…ubator#2245)

Summary: Pull Request resolved: facebookincubator#2245

Differential Revision: D38565343

Pulled By: mbasmanova

fbshipit-source-id: a6a670105d921cf6fd8d88f4d52e112f1a7c00a4
@mbasmanova mbasmanova force-pushed the transform-keys-values branch from 71bc9a4 to 12ba118 Compare August 10, 2022 03:21
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38565343

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Masha, looks good. Some minor questions.

keyRows, wrapCapture, context, lambdaArgs, &transformedKeys);
}

// TODO Check for duplicates in transformedKeys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is checked below right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed.

auto wrapCapture = toWrapCapture<MapVector>(
numKeys, entry.callable, *entry.rows, flatMap);

entry.callable->apply(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: One question: across multiple lambdas , how do we ensure the type of keys remains same ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the job of the query planner. We assume the types are correct similar to different "then" and "else" branches of a switch statement.

// In most cases there will be only one function and the loop will run once.
auto it = args[1]->asUnchecked<FunctionVector>()->iterator(&rows);
while (auto entry = it.next()) {
auto keyRows =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do keyRows change over lambdas ? I would presume typically entry.rows would be similar .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a general case, there is only one lambda function and entry.rows == rows. However, in case when there are multiple lambdas, each lambda applies to a unique subset of rows, hence, keyRows are non-overlapping between lambdas.

// In most cases there will be only one function and the loop will run once.
auto it = args[1]->asUnchecked<FunctionVector>()->iterator(&rows);
while (auto entry = it.next()) {
auto keyRows =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: valueRows ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Renamed.


VectorPtr transformedValues;

// Loop over lambda functions and apply these to keys of the map.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: apply these to values of a map ?

namespace {

// See documentation at https://prestodb.io/docs/current/functions/map.html
class TransformValuesFunction : public exec::VectorFunction {
Copy link
Contributor

@kgpai kgpai Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It seems values/key transforms can be templatized to one function, however this makes easier reading personally to me at expense of code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that and decided not to use templates for readability.

TEST_F(TransformKeysTest, conditional) {
vector_size_t size = 1'000;

// Make 2 columns: the map to transform and a boolean that decided which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have a test with 2 lambdas with different types returned and confirm it fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have a test with 2 lambdas with different types returned and confirm it fails.

I don't think this testing should apply to each lambda function. Instead, I suggest to add tests to make sure FunctionVector cannot be created to store lambda of different types. I'll do that in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2255 adds sanity checks to switch expression

…ubator#2245)

Summary: Pull Request resolved: facebookincubator#2245

Differential Revision: D38565343

Pulled By: mbasmanova

fbshipit-source-id: 5eecb069a1a23d772d15893feb12f2ec88b4b781
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38565343

@mbasmanova mbasmanova force-pushed the transform-keys-values branch from 12ba118 to 0e7be68 Compare August 10, 2022 18:56
@mbasmanova
Copy link
Contributor Author

@kgpai Krishna, thank you for review. I replied to comments and update the PR to address these. Would you take another look?

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you Masha for adding these super useful functions!

auto size = sizes[row];
for (auto i = 1; i < size; i++) {
if (mapKeys->equalValueAt(mapKeys.get(), offset + i, offset + i - 1)) {
VELOX_USER_FAIL("{}", kDuplicateKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe also print out the value of the duplicate key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. Let me add this in a follow-up PR as it should be added to the map() function as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori #2260 adds duplicate key to the message.

assertEqualVectors(expectedResult, result);
}

TEST_F(TransformKeysTest, dictionaryWithDuplicates) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what does this case test? (e.g., what's the difference between this and dictionaryWithUniqueValues for transform_keys?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking this function doesn't distinguish between different dictionaries. I was adding these tests for completeness. Ideally, we would enhance Fuzzer to support lambda functions to make sure we get coverage of inputs with all sorts of encodings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants