-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7322] [SQL] [WIP] Support Window Function in DataFrame #6104
Conversation
Merged build triggered. |
Merged build started. |
Test build #32572 has started for PR 6104 at commit |
Test build #32572 has finished for PR 6104 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Hi, In the JIRA the following examples is given:
Is there a reason for why the aggregation operation has moved from the beginning (the style in the JIRA), to the end (style above)? Are both still possible? I'd prefer the former, since it seems a bit shorter, and more recognizable comming from SQL. On a related note. Is it also an idea to be able to create a seperate window (groupBy/orderBy) definition and use this definition in one or more windowed aggregates. For example:
|
It seems to me it'd be easier to have the aggregate function in the front also. @chenghao-intel any reason you designed it this way? Is it to accommodate multiple aggregates for the same window? |
Oh, actually it was my bad, the The window definition should be quite independent also, and it can be given a name, as @rxin said, it would be simpler if accommodating multiple functions. And, it would be easier to create functions(aggregate/window function) via the object |
For |
@chenghao-intel - I think it's better to follow SQL more closely here. I don't think it is much harder to write multiple aggregates, since the user can easily just add a function to create the window statement. On the contrary, it is more clear how many aggregates we are applying if we put this in the front. |
Ok, I will update the code soon. |
Merged build triggered. |
Merged build started. |
Test build #32702 has started for PR 6104 at commit |
I've updated the code and description, however, I have no idea if we can remove the |
Test build #32702 has finished for PR 6104 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
There was talk earlier of referencing the window function API that jooq uses when implementing this in SparkSQL. Is it a goal to make this similar to jooq's syntax? http://blog.jooq.org/2013/11/03/probably-the-coolest-sql-feature-window-functions/ |
Yup it is fairly similar. |
Thank you @ash211 That's really a cool idea to support a named window in the page that you send. I will update the code. |
Merged build triggered. |
Merged build started. |
Test build #32827 has started for PR 6104 at commit |
Test build #32827 has finished for PR 6104 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
@rxin Updated! |
Merged build triggered. |
Merged build started. |
} | ||
|
||
@Test | ||
public void saveTableAndQueryIt() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the function name.
Test build #33254 has started for PR 6104 at commit |
* based on it. | ||
*/ | ||
@Experimental | ||
object Window extends Window() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think about it more -- this is actually problematic for java because none partitionBy, orderBy won't become static methods due to conflicts with the Window class. Here's the fix.
Move "class Window" into "object Window", and rename it to WindowSpec, and then just define the two partitionBy / orderBy top level methods in object Window. If we need another method for a window spec that doesn't have partitionBy/orderBy, we can add another one - I don't have a good name for it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai suggested "allRows"
Test build #33254 has finished for PR 6104 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
Actually I will submit a PR against your branch. |
[SPARK-7322] [SQL] [WIP] Support Window Function in DataFrame
This closes #6104. Author: Cheng Hao <hao.cheng@intel.com> Author: Reynold Xin <rxin@databricks.com> Closes #6343 from rxin/window-df and squashes the following commits: 026d587 [Reynold Xin] Address code review feedback. dc448fe [Reynold Xin] Fixed Hive tests. 9794d9d [Reynold Xin] Moved Java test package. 9331605 [Reynold Xin] Refactored API. 3313e2a [Reynold Xin] Merge pull request #6104 from chenghao-intel/df_window d625a64 [Cheng Hao] Update the dataframe window API as suggsted c141fb1 [Cheng Hao] hide all of properties of the WindowFunctionDefinition 3b1865f [Cheng Hao] scaladoc typos f3fd2d0 [Cheng Hao] polish the unit test 6847825 [Cheng Hao] Add additional analystcs functions 57e3bc0 [Cheng Hao] typos 24a08ec [Cheng Hao] scaladoc 28222ed [Cheng Hao] fix bug of range/row Frame 1d91865 [Cheng Hao] style issue 53f89f2 [Cheng Hao] remove the over from the functions.scala 964c013 [Cheng Hao] add more unit tests and window functions 64e18a7 [Cheng Hao] Add Window Function support for DataFrame
This closes #6104. Author: Cheng Hao <hao.cheng@intel.com> Author: Reynold Xin <rxin@databricks.com> Closes #6343 from rxin/window-df and squashes the following commits: 026d587 [Reynold Xin] Address code review feedback. dc448fe [Reynold Xin] Fixed Hive tests. 9794d9d [Reynold Xin] Moved Java test package. 9331605 [Reynold Xin] Refactored API. 3313e2a [Reynold Xin] Merge pull request #6104 from chenghao-intel/df_window d625a64 [Cheng Hao] Update the dataframe window API as suggsted c141fb1 [Cheng Hao] hide all of properties of the WindowFunctionDefinition 3b1865f [Cheng Hao] scaladoc typos f3fd2d0 [Cheng Hao] polish the unit test 6847825 [Cheng Hao] Add additional analystcs functions 57e3bc0 [Cheng Hao] typos 24a08ec [Cheng Hao] scaladoc 28222ed [Cheng Hao] fix bug of range/row Frame 1d91865 [Cheng Hao] style issue 53f89f2 [Cheng Hao] remove the over from the functions.scala 964c013 [Cheng Hao] add more unit tests and window functions 64e18a7 [Cheng Hao] Add Window Function support for DataFrame (cherry picked from commit f6f2eeb) Signed-off-by: Reynold Xin <rxin@databricks.com>
This closes apache#6104. Author: Cheng Hao <hao.cheng@intel.com> Author: Reynold Xin <rxin@databricks.com> Closes apache#6343 from rxin/window-df and squashes the following commits: 026d587 [Reynold Xin] Address code review feedback. dc448fe [Reynold Xin] Fixed Hive tests. 9794d9d [Reynold Xin] Moved Java test package. 9331605 [Reynold Xin] Refactored API. 3313e2a [Reynold Xin] Merge pull request apache#6104 from chenghao-intel/df_window d625a64 [Cheng Hao] Update the dataframe window API as suggsted c141fb1 [Cheng Hao] hide all of properties of the WindowFunctionDefinition 3b1865f [Cheng Hao] scaladoc typos f3fd2d0 [Cheng Hao] polish the unit test 6847825 [Cheng Hao] Add additional analystcs functions 57e3bc0 [Cheng Hao] typos 24a08ec [Cheng Hao] scaladoc 28222ed [Cheng Hao] fix bug of range/row Frame 1d91865 [Cheng Hao] style issue 53f89f2 [Cheng Hao] remove the over from the functions.scala 964c013 [Cheng Hao] add more unit tests and window functions 64e18a7 [Cheng Hao] Add Window Function support for DataFrame
This closes apache#6104. Author: Cheng Hao <hao.cheng@intel.com> Author: Reynold Xin <rxin@databricks.com> Closes apache#6343 from rxin/window-df and squashes the following commits: 026d587 [Reynold Xin] Address code review feedback. dc448fe [Reynold Xin] Fixed Hive tests. 9794d9d [Reynold Xin] Moved Java test package. 9331605 [Reynold Xin] Refactored API. 3313e2a [Reynold Xin] Merge pull request apache#6104 from chenghao-intel/df_window d625a64 [Cheng Hao] Update the dataframe window API as suggsted c141fb1 [Cheng Hao] hide all of properties of the WindowFunctionDefinition 3b1865f [Cheng Hao] scaladoc typos f3fd2d0 [Cheng Hao] polish the unit test 6847825 [Cheng Hao] Add additional analystcs functions 57e3bc0 [Cheng Hao] typos 24a08ec [Cheng Hao] scaladoc 28222ed [Cheng Hao] fix bug of range/row Frame 1d91865 [Cheng Hao] style issue 53f89f2 [Cheng Hao] remove the over from the functions.scala 964c013 [Cheng Hao] add more unit tests and window functions 64e18a7 [Cheng Hao] Add Window Function support for DataFrame
This closes apache#6104. Author: Cheng Hao <hao.cheng@intel.com> Author: Reynold Xin <rxin@databricks.com> Closes apache#6343 from rxin/window-df and squashes the following commits: 026d587 [Reynold Xin] Address code review feedback. dc448fe [Reynold Xin] Fixed Hive tests. 9794d9d [Reynold Xin] Moved Java test package. 9331605 [Reynold Xin] Refactored API. 3313e2a [Reynold Xin] Merge pull request apache#6104 from chenghao-intel/df_window d625a64 [Cheng Hao] Update the dataframe window API as suggsted c141fb1 [Cheng Hao] hide all of properties of the WindowFunctionDefinition 3b1865f [Cheng Hao] scaladoc typos f3fd2d0 [Cheng Hao] polish the unit test 6847825 [Cheng Hao] Add additional analystcs functions 57e3bc0 [Cheng Hao] typos 24a08ec [Cheng Hao] scaladoc 28222ed [Cheng Hao] fix bug of range/row Frame 1d91865 [Cheng Hao] style issue 53f89f2 [Cheng Hao] remove the over from the functions.scala 964c013 [Cheng Hao] add more unit tests and window functions 64e18a7 [Cheng Hao] Add Window Function support for DataFrame
This is a WIP PR for early feedback.
The usage is kind of like:
NTILE
,ROW_NUMBER
,DENSE_RANK
,RANK
,CUME_DIST
andPERCENT_RANK
, which supported by Hive)