Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch: Execute root fragment task in frontend. #3269

Closed
Tracked by #1977
liurenjie1024 opened this issue Jun 16, 2022 · 4 comments
Closed
Tracked by #1977

batch: Execute root fragment task in frontend. #3269

liurenjie1024 opened this issue Jun 16, 2022 · 4 comments
Assignees
Labels
component/batch Batch related related issue. type/enhancement Improvements to existing implementation.

Comments

@liurenjie1024
Copy link
Contributor

liurenjie1024 commented Jun 16, 2022

Currently the root fragment of mpp query is executed in compute node. It would be better to move it to frontend to reduce schedule and pull data latency.

@liurenjie1024 liurenjie1024 changed the title Execute root fragment task in frontend. batch: Execute root fragment task in frontend. Jun 16, 2022
@liurenjie1024 liurenjie1024 added type/enhancement Improvements to existing implementation. component/batch Batch related related issue. labels Jun 16, 2022
@lmatz
Copy link
Contributor

lmatz commented Jun 16, 2022

There are two cases:

  1. the root fragment is simply a single distribution exchange operator. It will be scheduled at the frontend.
  2. the root fragment has other non-exchange operators. I think we may still schedule them at the compute nodes.

@fuyufjh
Copy link
Member

fuyufjh commented Jun 17, 2022

There are two cases:

  1. the root fragment is simply a single distribution exchange operator. It will be scheduled at the frontend.
  2. the root fragment has other non-exchange operators. I think we may still schedule them at the compute nodes.

Totally agree. By the way, the plan generated by optimizers should always satisfies

  • Either, it has a singleton fragment (aka. it's ended in singleton exchange(s)) on the top e.g. in the cases of Global Agg/TopN
  • Or, it has a singleton exchange on the top to gather data together

(My point is, if not so, please raise a bug :D)

@liurenjie1024
Copy link
Contributor Author

For 2, there are two cases:

  • 2.1 No table scan, for example select 1+ 2
  • 2.2 Singleton table scan, for example select * from t where a = 1 and a is a key of t.

Obviously 2.1 can be scheduled on frontend, while 2.2 can't.

For 2.2 I would suggest to add a singleton exchange on top of table scan so that all three cases can be executed in same way, and we don't need to maintain another code branch for execution.

@fuyufjh
Copy link
Member

fuyufjh commented Jun 17, 2022

For 2.2 I would suggest to add a singleton exchange on top of table scan so that all three cases can be executed in same way, and we don't need to maintain another code branch for execution.

+1, but we already had one.

dev=> explain select l_orderkey,l_linenumber from lineitem where l_orderkey = 10;
                               QUERY PLAN
------------------------------------------------------------------------
 BatchExchange { order: [], dist: Single }
   BatchFilter { predicate: ($0 = 10:Int32) }
     BatchScan { table: lineitem, columns: [l_orderkey, l_linenumber] }
(3 rows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/batch Batch related related issue. type/enhancement Improvements to existing implementation.
Projects
None yet
Development

No branches or pull requests

4 participants