Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KudtfFlatMapper pushdown #3697

Open
agavra opened this issue Oct 29, 2019 · 0 comments
Open

KudtfFlatMapper pushdown #3697

agavra opened this issue Oct 29, 2019 · 0 comments

Comments

@agavra
Copy link
Contributor

agavra commented Oct 29, 2019

When we build a logical plan, the first thing that we do is to apply the flat map operation on any UDTFs:

  public OutputNode buildPlan() {
    PlanNode currentNode = buildSourceNode();

    if (!analysis.getTableFunctions().isEmpty()) {
      currentNode = buildFlatMapNode(currentNode);
    }

    if (analysis.getWhereExpression().isPresent()) {
      currentNode = buildFilterNode(currentNode, analysis.getWhereExpression().get());
    }

    if (analysis.getGroupByExpressions().isEmpty()) {
      currentNode = buildProjectNode(currentNode);
    } else {
      currentNode = buildAggregateNode(currentNode);
    }

    return buildOutputNode(currentNode);
  }

This means that we will be applying all of the following nodes on the output of the flat map operation. This has a few issues:

  • we may be evaluating complex expressions n-times more than we need to, where n is the number of outputs of the udtf. Take SELECT explode(COL1), udtfWithNetworkCall(COL2) FROM FOO; as an example. We should only evaluate udtfWithNetworkCall(COL2) once per input row since it is a static udf. Instead, with the current implementation, we call it once for each output of explode(COL1).
  • we need to "special case" evaluation of complex expressions that are inputs to UDTFs (e.g. SELECT explode(my_udf(COL1)) (see feat: Implement complex expressions for table functions #3683 (comment)). Instead, I feel that this should be done as its own logical node step so that KudtfFlatMapper does not need to duplicate the expression execution logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants