-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NOT A BUG] Why comet does not convert the HashAggregate expression to native in my query? #503
Comments
Could you run a simple query to verify if Comet shuffle can be triggered? |
Oh, could you disable |
Yea, you can open a PR to update the document. Although it should be just a temporary limit and we are working on to remove it. We can update the document again once the limitation is removed. |
You may also want to set I wonder if we should default this to true. |
## Which issue does this PR close? Closes #503 Closes #191 ## Rationale for this change 1. Provide a way to build Comet from the source on an isolated environments with an access to github.com 2. Update documentation in part, related to compatibility of Spark AQE and Comet Shuffle ## What changes are included in this PR? - Update tuning section about the compatibility of Shuffle and Spark AQE - Add `release-nogit` for building on an isolated environments - Update docs in the section about an installation process Changes to be committed: modified: Makefile modified: docs/source/user-guide/installation.md modified: docs/source/user-guide/tuning.md ## How are these changes tested? I run both `make release` and `make release-nogit`. The first one created properties file in `common/target/classes` but the second did not. The flag `-Dmaven.gitcommitid.skip=true` is described in [this comment](git-commit-id/git-commit-id-maven-plugin#392 (comment)).
## Which issue does this PR close? Closes apache#503 Closes apache#191 ## Rationale for this change 1. Provide a way to build Comet from the source on an isolated environments with an access to github.com 2. Update documentation in part, related to compatibility of Spark AQE and Comet Shuffle ## What changes are included in this PR? - Update tuning section about the compatibility of Shuffle and Spark AQE - Add `release-nogit` for building on an isolated environments - Update docs in the section about an installation process Changes to be committed: modified: Makefile modified: docs/source/user-guide/installation.md modified: docs/source/user-guide/tuning.md ## How are these changes tested? I run both `make release` and `make release-nogit`. The first one created properties file in `common/target/classes` but the second did not. The flag `-Dmaven.gitcommitid.skip=true` is described in [this comment](git-commit-id/git-commit-id-maven-plugin#392 (comment)).
Describe the bug
I'm running a query that do the following:
sum
,min
,max
,mean
)I have the following logical plan (I manually truncated some parts):
It is converted to the following Comet plan:
Visualization:

Steps to reproduce
I'm running my own benchmark:
generator --prefix test_data_tiny
;Expected behavior
I expected to see full native plan, but for some reason the last
HashAggregate
is running on Spark. It looks to me that it is running even in "spark interpreter mode" (I guess because I want too much aggregations and it exceed the limit of the code size for the "Whole stage CodeGet" but I'm not 100% sure).I checked the documentation of the Comet project and it looks like
case-when
expressions,sum
/min
/max
/mean
expressions are supported.HashAggregate
is supported too.Exchange
should be supported too because I turned on Comet shuffle (--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
,--conf spark.comet.exec.shuffle.enabled=true
,--conf spark.comet.exec.shuffle.mode=native
).Why if partial aggregation is in Comet the final one isn't and I have a
ColumnarToRow
instead?Additional context
I'm ready to provide any additional information or to run any debug query.
Thanks in advance!
The text was updated successfully, but these errors were encountered: