-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Better error message for partition filtering if no file found #27353
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cleaner fix would be to make a PathFilter, and then we subclass it with PartitionFilter and ExtensionFilter. That should be able to address the root cause for the user (they felt confused by the parameter name being "partition" while it's nothing about partition).
if len(self._paths) == 0: | ||
raise ValueError( | ||
"Not found any input file to read from. Please double " | ||
"check the input files are having expected extension(s): " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessarily doing an extension filtering (it can be just partition filtering yields empty paths) so we should not just point to extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianoaix - thanks for pointing it out. I agree with your point, so changed the error message. I am happy to look into a better API for path filtering as followup. Right now I think the error message is pretty confusing ValueError: number sections must be larger than 0.
and worth to fix. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry I wasn't meaning to get the API name right in this PR.
For the message itself, if more detailed/specific information wanted, maybe use isinstance()
to dispatch PartitionFilter and ExtensionFilter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use isinstance() to dispatch PartitionFilter and ExtensionFilter?
hmm for PathPartitionFilter
it seems a little hard to extract a message - we support an arbitrary filter_fn
inside it. I changed the error message to be:
"Not found any input file to read from. Please double check
'partition_filter' field is set properly."
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits!
Signed-off-by: Cheng Su <scnju13@gmail.com>
The failed |
…und (ray-project#27353) User raised issue in ray-project#26605, where the user found the error message was quite non-actionable when partition filtering input files, and no files with required extension being found. Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
…und (ray-project#27353) User raised issue in ray-project#26605, where the user found the error message was quite non-actionable when partition filtering input files, and no files with required extension being found. Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Signed-off-by: Cheng Su scnju13@gmail.com
Why are these changes needed?
User raised issue in #26605, where the user found the error message was quite non-actionable when partition filtering input files, and no files with required extension being found.
Before this PR, the error message is:
After this PR, the error message is:
Related issue number
#26605
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.