-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4598][WebUI]Task table pagination for the Stage page #7399
Conversation
I will clean this PR soon. @pwendell any thoughts about the screenshot? |
Test build #37239 has finished for PR 7399 at commit
|
Yes!! |
Can the table sort globally by any field? |
Yes. It will sort the data by the column and slice the corresponding data for a page in the server. |
Here is the rule for page navigation.
Here are some examples of the page navigation:
And a screenshot for 100k tasks: It's about 1.6 seconds to return the content from the server. |
@@ -231,52 +241,25 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage("stage") { | |||
accumulableRow, | |||
accumulables.values.toSeq) | |||
|
|||
val taskHeadersAndCssClasses: Seq[(String, String)] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taskHeadersAndCssClasses
is moved to TaskPagedTable.headers
Test build #37366 has finished for PR 7399 at commit
|
The REST API provides all the task information in JSON format. It seems it is easier to enable pagination on client side by consuming the JSON data. I was actually doing something like this. Is there any reason not consuming the JSON data on the UI? |
I chose this way because the current Spark UI doesn't use the REST API. In addition, we cannot write unit tests for JavaScript now, so it's better to use Scala. |
stageData.hasBytesSpilled, | ||
currentTime, | ||
page = taskPage, | ||
pageSize = 100, // Show 100 tasks at most in the table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this something higher? Maybe 200?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second though I prefer 100.
I took a look at the UI and I like it. I did have two thoughts though:
|
* < 1 2* 3 4 5 6 7 8 9 10 > >> | ||
* | ||
* This is the first group and the first page, so "<<" and "<" are hidden. | ||
* 1 2* 3 4 5 6 7 8 9 10 > >> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 1* 2 3 4 ?
@zsxwing , I just left some minor comments. |
Maybe we should add an error page for the Spark UI? Now this is the default Jetty error page. |
…f expand-dag-viz-arrow-true and expand-dag-viz-arrow-false
I was thinking just displaying the normal stage page, but replace the task table with an informative message, something like "Page 111 is out of range. Please select a page number between 1 and 100." We probably don't need to add an error page. If that's a lot of work feel free to file a separate JIRA and do it later. |
@andrewor14 the latest commits addressed your comments. Here is the new screenshot: |
Test build #37774 has finished for PR 7399 at commit
|
Hi @zsxwing thanks for implementing the changes. I tried this out locally and it looks like there are a few usability issues with the new features:
|
@andrewor14 Here is the new screenshot for the error message: And if the user updates the page size, the page number will be reset to 1. |
retest this please |
Test build #35 has finished for PR 7399 at commit
|
Test build #37816 has finished for PR 7399 at commit
|
Looks great!! I'll try this out later today again. |
LGTM, I'm merging this into master. Thanks @zsxwing. |
This PR adds pagination for the task table to solve the scalability issue of the stage page. Here is the initial screenshot: <img width="1347" alt="pagination" src="https://cloud.githubusercontent.com/assets/1000778/8679669/9e63863c-2a8e-11e5-94e4-994febcd6717.png"> The task table only shows 100 tasks. There is a page navigation above the table. Users can click the page navigation or type the page number to jump to another page. The table can be sorted by clicking the headers. However, unlike previous implementation, the sorting work is done in the server now. So clicking a table column to sort needs to refresh the web page. Author: zsxwing <zsxwing@gmail.com> Closes apache#7399 from zsxwing/task-table-pagination and squashes the following commits: 144f513 [zsxwing] Display the page navigation when the page number is out of range a3eee22 [zsxwing] Add extra space for the error message 54c5b84 [zsxwing] Reset page to 1 if the user changes the page size c2f7f39 [zsxwing] Add a text field to let users fill the page size bad52eb [zsxwing] Display user-friendly error messages 410586b [zsxwing] Scroll down to the tasks table if the url contains any sort column a0746d1 [zsxwing] Use expand-dag-viz-arrow-job and expand-dag-viz-arrow-stage instead of expand-dag-viz-arrow-true and expand-dag-viz-arrow-false b123f67 [zsxwing] Use localStorage to remember the user's actions and replay them when loading the page 894a342 [zsxwing] Show the link cursor when hovering for headers and page links and other minor fix 4d4fecf [zsxwing] Address Carson's comments d9285f0 [zsxwing] Add comments and fix the style 74285fa [zsxwing] Merge branch 'master' into task-table-pagination db6c859 [zsxwing] Task table pagination for the Stage page
### What changes were proposed in this pull request? Add Pagination Support for structured streaming page. Now both tables `Active Queries` and `Completed Queries` will have pagination. To implement pagination, pagination framework from #7399 is used. * Also tables will only be shown if there is at least one entry in the table. ### Why are the changes needed? * This will help users in analysing their structured streaming queries in much better way. * Other Web UI pages support pagination in their table. So this will make web UI more consistent across pages. * This can prevent potential OOM errors. ### Does this PR introduce _any_ user-facing change? Yes. Both tables will support pagination. ### How was this patch tested? Manually. I will add snapshots soon. Closes #28485 from iRakson/SPARK-31642. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
### What changes were proposed in this pull request? * Pagination Support is added to all tables of streaming page in spark web UI. For adding pagination support, existing classes from #7399 were used. * Earlier streaming page has two tables `Active Batches` and `Completed Batches`. Now, we will have three tables `Running Batches`, `Waiting Batches` and `Completed Batches`. If we have large number of waiting and running batches then keeping track in a single table is difficult. Also other pages have different table for different type type of data. * Earlier empty tables were shown. Now only non-empty tables will be shown. `Active Batches` table used to show details of waiting batches followed by running batches. ### Why are the changes needed? Pagination will allow users to analyse the table in much better way. All spark web UI pages support pagination apart from streaming pages, so this will add consistency as well. Also it might fix the potential OOM errors that can arise. ### Does this PR introduce _any_ user-facing change? Yes. `Active Batches` table is split into two tables `Running Batches` and `Waiting Batches`. Pagination Support is added to the all the tables. Every other functionality is unchanged. ### How was this patch tested? Manually. Before changes: <img width="1667" alt="Screenshot 2020-05-03 at 7 07 14 PM" src="https://user-images.githubusercontent.com/15366835/80915680-8fb44b80-8d71-11ea-9957-c4a3769b8b67.png"> After Changes: <img width="1669" alt="Screenshot 2020-05-03 at 6 51 22 PM" src="https://user-images.githubusercontent.com/15366835/80915694-a9ee2980-8d71-11ea-8fc5-246413a4951d.png"> Closes #28439 from iRakson/streamingPagination. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
### What changes were proposed in this pull request? * Pagination Support is added to all tables of streaming page in spark web UI. For adding pagination support, existing classes from apache#7399 were used. * Earlier streaming page has two tables `Active Batches` and `Completed Batches`. Now, we will have three tables `Running Batches`, `Waiting Batches` and `Completed Batches`. If we have large number of waiting and running batches then keeping track in a single table is difficult. Also other pages have different table for different type type of data. * Earlier empty tables were shown. Now only non-empty tables will be shown. `Active Batches` table used to show details of waiting batches followed by running batches. ### Why are the changes needed? Pagination will allow users to analyse the table in much better way. All spark web UI pages support pagination apart from streaming pages, so this will add consistency as well. Also it might fix the potential OOM errors that can arise. ### Does this PR introduce _any_ user-facing change? Yes. `Active Batches` table is split into two tables `Running Batches` and `Waiting Batches`. Pagination Support is added to the all the tables. Every other functionality is unchanged. ### How was this patch tested? Manually. Before changes: <img width="1667" alt="Screenshot 2020-05-03 at 7 07 14 PM" src="https://user-images.githubusercontent.com/15366835/80915680-8fb44b80-8d71-11ea-9957-c4a3769b8b67.png"> After Changes: <img width="1669" alt="Screenshot 2020-05-03 at 6 51 22 PM" src="https://user-images.githubusercontent.com/15366835/80915694-a9ee2980-8d71-11ea-8fc5-246413a4951d.png"> Closes apache#28439 from iRakson/streamingPagination. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
This PR adds pagination for the task table to solve the scalability issue of the stage page. Here is the initial screenshot:
data:image/s3,"s3://crabby-images/addac/addac6dbba996b37f136359c9aea528bbd02197b" alt="pagination"
The task table only shows 100 tasks. There is a page navigation above the table. Users can click the page navigation or type the page number to jump to another page. The table can be sorted by clicking the headers. However, unlike previous implementation, the sorting work is done in the server now. So clicking a table column to sort needs to refresh the web page.