-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression #27429
Conversation
@gatorsmile @cloud-fan I'm sorry for open a new PR. Because I made some mistake operator. |
Test build #117719 has finished for PR 27429 at commit
|
Test build #117730 has finished for PR 27429 at commit
|
retest this please |
Test build #117736 has finished for PR 27429 at commit
|
This is
Can we create a new operator
Let's not add a feature that is disabled by default and not needed by most users. Maintaining a feature has cost. For OFFSET without LIMIT, we already have an operator So a solution can be
|
@cloud-fan Thanks a lot ! |
ah sorry made a mistake. Let's just don't support OFFSET only for now. Its output is kind of unlimited. |
OK. |
Test build #117798 has finished for PR 27429 at commit
|
@cloud-fan I have refactored the implementation. |
retest this please |
Test build #117867 has finished for PR 27429 at commit
|
retest this please |
Test build #117896 has finished for PR 27429 at commit
|
Test build #117905 has finished for PR 27429 at commit
|
Test build #117907 has finished for PR 27429 at commit
|
retest this please |
Test build #117922 has finished for PR 27429 at commit
|
Test build #121199 has finished for PR 27429 at commit
|
retest this please |
Test build #121215 has finished for PR 27429 at commit
|
Test build #123718 has finished for PR 27429 at commit
|
retest this please. |
Test build #123756 has finished for PR 27429 at commit
|
retest this please. |
Test build #123764 has finished for PR 27429 at commit
|
retest this please. |
Test build #126825 has finished for PR 27429 at commit
|
retest this please. |
Test build #126863 has finished for PR 27429 at commit
|
cc @cloud-fan |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129143 has finished for PR 27429 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130867 has finished for PR 27429 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #130898 has finished for PR 27429 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This is a ANSI SQL and feature id is
F861
For example:
There are some mainstream database support the syntax.
Druid
https://druid.apache.org/docs/latest/querying/sql.html#offset
Kylin
http://kylin.apache.org/docs/tutorial/sql_reference.html#QUERYSYNTAX
Exasol
https://docs.exasol.com/sql/select.htm
Greenplum
http://docs.greenplum.org/6-8/ref_guide/sql_commands/SELECT.html
MySQL
https://dev.mysql.com/doc/refman/5.6/en/select.html
Monetdb
https://www.monetdb.org/Documentation/SQLreference/SQLSyntaxOverview#SELECT
PostgreSQL
https://www.postgresql.org/docs/11/queries-limit.html
Sqlite
https://www.sqlite.org/lang_select.html
Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/SELECT/OFFSETClause.htm?zoom_highlight=offset
The description for design:
1. Consider
OFFSET
as the special case ofLIMIT
. For example:SELECT * FROM a limit 10;
similar toSELECT * FROM a limit 10 offset 0;
SELECT * FROM a offset 10;
similar toSELECT * FROM a limit -1 offset 10;
2. Because the current implement of
LIMIT
has good performance. For example:SELECT * FROM a limit 10;
parsed to the logic plan as below:and then the physical plan as below:
This operator reduce massive shuffle and has good performance.
Sometimes, the logic plan transformed to the physical plan as:
If the SQL contains order by, such as
SELECT * FROM a order by c limit 10;
.This SQL will be transformed to the physical plan as below:
Based on this situation, this PR produces the following operations. For example:
SELECT * FROM a limit 10 offset 10;
parsed to the logic plan as below:After optimization, the above logic plan will be transformed to:
and then the physical plan as below:
Sometimes, the logic plan transformed to the physical plan as:
If the SQL contains order by, such as
SELECT * FROM a order by c limit 10 offset 10;
.This SQL will be transformed to the physical plan as below:
3.In addition to the above, there is a special case that is only offset but no limit. For example:
SELECT * FROM a offset 10;
parsed to the logic plan as below:If offset is very large, will generate a lot of overhead. So this PR will refuse use offset clause without limit clause, although we can parse, transform and execute it.
A balanced idea is add a configuration item
spark.sql.forceUsingOffsetWithoutLimit
to force running query when user knows the offset is small enough. The default value ofspark.sql.forceUsingOffsetWithoutLimit
is false. This PR just came up with the idea so that it could be implemented at a better time in the future.Note: The origin PR to support this feature is #25416.
Because the origin PR too old, there exists massive conflict which is hard to resolve. So I open this new PR to support this feature.
Why are the changes needed?
new feature
Does this PR introduce any user-facing change?
'No'
How was this patch tested?
Exists and new UT