-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Progress Bar: Sort sample in "rows" and remove the duplicate Sort sample. #47106
Conversation
@scottjlee With this fix, all progress bar (expect two when read the file) are replaced with rows. And the sort sample problem is also fixed. |
if hasattr(result, "num_rows"): | ||
num_rows = result.num_rows | ||
elif hasattr(result, "__len__"): | ||
# For output is DataFrame,i.e. sort_sample | ||
num_rows = len(result) | ||
else: | ||
num_rows = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you consolidate this logic and the one in block_until_complete()
as a static method in ProgressBar
or a utility method in the file?
@@ -125,7 +125,6 @@ def __init__( | |||
"Sort", | |||
input_op, | |||
sub_progress_bar_names=[ | |||
SortTaskSpec.SORT_SAMPLE_SUB_PROGRESS_BAR_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. I think we actually want to keep this bar here, so that it is initialized as a sub-progress bar for the AbstractAllToAll
op, and we should remove the one that is created in SortTaskSpec.sample_boundaries()
. We want the "Sort Sample" to be a sub-bar of the overall Sort
bar, like this:
In terms of the concrete change, we can update sample_boundaries()
to reference the sub-progress bar that is part of the operator. This will also require us to either:
(1) change sample_boundaries
from a static method to class method, and use the object's sub-progress bar
(2) add a sub_progress_bar
parameter to sample_boundaries()
, and at the callers of this method, pass the SortTaskSpec
instance's progress bar (e.g. here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes lgtm, can you upload a new image of what the progress bars look like with these changes + whatever script you used for the example for posterity?
Sure!, let me make it better and I will @ you and scott when I finished! |
Hi, @scottjlee and @omatthew98 here is what look like now. TOOD: There is one missing step that is: I tried different solutions but the sort sample bar doesn't update although it finished in fetch_until_complete. I double checked the But at least we have the sort sample inside the main now. FYI |
@Bye-legumes Nice, the updated bars look great!
For this part, which update are you referring to? In the "after" screenshot you shared, it looks like the Also another small question/request. It looks like in the initially created progress bar for Sort Sample (in the "before execution" screenshot), it shows up as its own bar, and not a sub-bar of |
|
I think what's going on with the
So, I think we can initialize the sub progress bar (or call Regarding the indices in the most recent screenshot you posted looks good actually, since it shows the operators in order with the indices. The description is initialized with |
OK. Maybe I can try to solve it later. I just fix any where it calls sort sample,i.e. in aggragate, so it's also a sub_progress bar now |
sample_bar = ProgressBar( | ||
SortTaskSpec.SORT_SAMPLE_SUB_PROGRESS_BAR_NAME, | ||
len(sample_results), | ||
unit="block", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, if we are setting the unit as block
here, how come the screenshot in #47106 (comment) shows the unit as row
for the Sort Sample
bars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see, this is only for the case where the bar is not passed from calling sample_boundaries
. understood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think if we use row
as the unit here, it should still be supported right? since you implemented extract_num_rows()
which will get the num_rows from blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
…ort sample. (ray-project#47106) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Currently, the sort sample is not in rows and there is a duplicate sort sample progress bar. ![image](https://github.com/user-attachments/assets/30aa9fc3-8e96-473e-a794-da4fc023093a) With this modification, Sort sample will be also in rows and the additional progress bar will be removed. ![image](https://github.com/user-attachments/assets/f0a3e5b6-3f84-4993-9f03-36a350aa47b0) In fact there should only one sort sample progress bar which is created at https://github.com/ray-project/ray/blob/e066289b374464f1e2692382fdea871eb34e3156/python/ray/data/_internal/planner/exchange/sort_task_spec.py#L166 while the one created in ``` sub_progress_bar_names=[ SortTaskSpec.SORT_SAMPLE_SUB_PROGRESS_BAR_NAME, ExchangeTaskSpec.MAP_SUB_PROGRESS_BAR_NAME, ExchangeTaskSpec.REDUCE_SUB_PROGRESS_BAR_NAME, ], ``` should be deleted. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [√] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [√] I've run `scripts/format.sh` to lint the changes in this PR. - [ √ I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [√] Unit tests - [√] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: zhilong <zhilong.chen@mail.mcgill.ca>
Why are these changes needed?
Currently, the sort sample is not in rows and there is a duplicate sort sample progress bar.
![image](https://private-user-images.githubusercontent.com/121425509/357460721-30aa9fc3-8e96-473e-a794-da4fc023093a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0MzM0OTAsIm5iZiI6MTczOTQzMzE5MCwicGF0aCI6Ii8xMjE0MjU1MDkvMzU3NDYwNzIxLTMwYWE5ZmMzLThlOTYtNDczZS1hNzk0LWRhNGZjMDIzMDkzYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxM1QwNzUzMTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mZDZhNzU1YzQ0YzVjYzQ1NTU1MDE4ZDY2MmMxYTQyZTRkMzI3NWM0M2Y3OTIyOWNlMWQ3Mjg0Y2JlZmE5NzgxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.ygHZ4I3oRouaC-UTLM9UsR6KpjCH9KdkAq45rJegBTA)
With this modification, Sort sample will be also in rows and the additional progress bar will be removed.
![image](https://private-user-images.githubusercontent.com/121425509/357462488-f0a3e5b6-3f84-4993-9f03-36a350aa47b0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0MzM0OTAsIm5iZiI6MTczOTQzMzE5MCwicGF0aCI6Ii8xMjE0MjU1MDkvMzU3NDYyNDg4LWYwYTNlNWI2LTNmODQtNDk5My05ZjAzLTM2YTM1MGFhNDdiMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxM1QwNzUzMTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lMzU0ZWNjZDgwNzgzYzdkNzljNmFiZTYwNDA1ZGU2ZjdlODBkNWNlYWNkODNkYTJhN2FjYjNmNjdlZTkzMGY4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.yyu9sKLlZByCuWhWP6_VB-T8pdDmUKZg_U8bcysmvtw)
In fact there should only one sort sample progress bar which is created at
ray/python/ray/data/_internal/planner/exchange/sort_task_spec.py
Line 166 in e066289
should be deleted.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.