-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range Error for BERT Masked Language Modeling on IMDB #16846
Comments
Hi @Jadiker 👋 In your notebook, after the If you add |
Nope, same error: |
@Jadiker Thank you for the update :) The problem seems to raise from your custom tokenization function, which is likely not returning the correct data format. See this notebook, which successfully runs your code if we skip We also reserve these GitHub issues for bugs in the repository and/or feature requests. For any other requests, like issues in your custom code, we'd like to invite you to use our forum 🤗 I'm closing this issue, but feel free to reopen with queries that fit the criteria I described. |
@gante Thanks for your time and for the information! I really appreciate it. Two comments:
Given that, should I still have posted on the forum first? If the tutorials for data processing and model training can't be combined, how is one supposed to train a model on the processed data? It seemed like something that should be fixed in the code, rather than just discussed on the forum.
Thanks again for engaging with this! |
Oops, forgot to change the permissions. Should be okay now |
After looking at the notebook you linked, it seems like the issue is that the tutorial notebook gives two different options for tokenizing text - by using both of them, rather than just using the first one, I introduced a bug into the code. Does that sounds accurate? |
@Jadiker Yeah, the problem seems to be at the dataset preparation stage. To be candid, I also can't find the issue from a quick glance -- I've double checked the As I mentioned above, we don't have the resources to do proper support in situations like this, but I'd be curious to find the root cause. Perhaps we could improve documentation with the findings :) If you get stuck, I might have capacity to pick it up in a few weeks. |
System Info
Who can help?
@LysandreJik
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
https://colab.research.google.com/drive/1ZpYRkJVMF5r3MukUheEFtgDvqax4YCxM?usp=sharing
Expected behavior
Evaluation to complete and give me a perplexity score, as it does [here](https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/chapter7/section3_tf.ipynb)
The text was updated successfully, but these errors were encountered: