-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass datasets trust_remote_code #31406
Pass datasets trust_remote_code #31406
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I think this is more intricate than I first thought. What about if I make another PR to pin upper version of datasets and then handle this PR more calmly? |
For example, the "hf-internal-testing/librispeech_asr_dummy" dataset name is passed in this test:
Should we define it in |
Sounds good!
Happy with that 👍 |
This reverts commit b767282.
This reverts commit 833fc17.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @albertvillanova ! LGTM, but I have 2 questions:
-
Any reason that we don't have this
trust_remote_dataset_code
in all files inexamples
that hasclass DataTrainingArguments
? (it seems this PR only adds to some files) -
I would like to have @muellerzr 's comment as he deals with
examples
folder frequently -
I guess the name
trust_remote_dataset_code
is because we have usedtrust_remote_code
forModelArguments
, right? Would like @amyeroberts 's opinion on this too.
So if the argument already exists in |
Yeah, I would love the latter if |
Thanks for your reviews, @amyeroberts and @ydshieh. So, you would prefer to have just a single load_dataset(
data_args.dataset_name,
data_args.dataset_config_name,
trust_remote_code=model_args.trust_remote_code, Just to be sure, because I find weird to use a |
Hi @albertvillanova Not really. What I am thinking is:
|
The assignment of the argument by In this PR, I will use
|
Thanks for checking @albertvillanova. That part is rather on our side, but if you want to take it, that would be much appreciated.
Works for me :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling all of these cases!
What does this PR do?
Next release of
datasets
will raise an error for script-datasets iftrust_remote_code
is not set to True.Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.