Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i was trying to create custom tokenizer for some language and got this as error or warning.. #19048

Closed
4 of 6 tasks
yes-its-shivam opened this issue Sep 15, 2022 · 9 comments
Closed
4 of 6 tasks

Comments

@yes-its-shivam
Copy link

yes-its-shivam commented Sep 15, 2022

System Info

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 11 files to the new cache system
0%
0/11 [00:02<?, ?it/s]
There was a problem when trying to move your cache:

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1127, in <module>
    move_cache()

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1090, in move_cache
    move_to_new_cache(

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1047, in move_to_new_cache
    huggingface_hub.file_download._create_relative_symlink(blob_path, pointer_path)

  File "C:\Users\shiva\anaconda3\lib\site-packages\huggingface_hub\file_download.py", line 841, in _create_relative_symlink
    raise OSError(


(Please file an issue at https://github.com/huggingface/transformers/issues/new/choose and copy paste this whole message and we will do our best to help.)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

#save pretrained model
from transformers import PreTrainedTokenizerFast

load the tokenizer in a transformers tokenizer instance

tokenizer = PreTrainedTokenizerFast(
tokenizer_object=tokenizer,
unk_token='[UNK]',
pad_token='[PAD]',
cls_token='[CLS]',
sep_token='[SEP]',
mask_token='[MASK]'
)

save the tokenizer

tokenizer.save_pretrained('bert-base-dv-hi')

Expected behavior

print out this 
('bert-base-dv-hi\\tokenizer_config.json',
 'bert-base-dv-hi\\special_tokens_map.json',
 'bert-base-dv-hi\\tokenizer.json')

Checklist

@LysandreJik
Copy link
Member

Hey @yes-its-shivam, thanks for reporting! I think this may have to do with our backend trying to create symlinks for the cached files, and failing to do so!

It seems you're running on Windows, which requires developer mode to be activated (or for Python to be run as an administrator).

To enable your device for development, we recommend reading this guide from Microsoft: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development

@BramVanroy
Copy link
Collaborator

BramVanroy commented Sep 16, 2022

Hi @LysandreJik. As far as I can see, this does not just happen once when moving the cache but also for every new model that you download. That means that for every model that I download I would have to find the Python bin of my venv, run it as admin, then download the model, and then continue my work, or install developer mode for Windows - which also requires admin privileges, and comes with other stuff that I may not wish to enable on my device (like allowing sideloading of unverified third party apps).

As far as I can see it, this change means that anyone who does not have admin privileges on their system (like, using the family computer, using school computers, student laptops in class, etc.) cannot use transformers. I'd love to be wrong about this, but at first glance this seems to put Windows away as an unfavorable child again. Can we try to look for a way around this?

Edit: this is not something I am eager to have to enable:

developer mode warning

@LysandreJik
Copy link
Member

Thanks for reporting @BramVanroy, I'm currently opening an issue on huggingface_hub so that we may track it.

However, if I'm not mistaken, Developer Mode must be enabled in order to leverage WSL, right? I would believe most developers would choose to use WSL in order to use transformers, but I may have been mistaken on that decision.

@LysandreJik
Copy link
Member

Opened an issue here to track all related issues: huggingface/huggingface_hub#1062

@ebolam
Copy link

ebolam commented Sep 19, 2022

For note, you do not need developer mode for WSL. I'm having the same problem and having to turn on developer mode will kill some of our user base. The warning will intimidate people away from using it.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@sgugger
Copy link
Collaborator

sgugger commented Oct 17, 2022

I think the issue has been solved on the huggingface_hub side, as long as you use the latest version. Please let us know otherwise!

@sgugger sgugger closed this as completed Oct 17, 2022
@chenye-814
Copy link

I think the issue has been solved on the huggingface_hub side, as long as you use the latest version. Please let us know otherwise!

I am using the latest version of Huggingface-hub(0.11.0), but still facing the same issue.

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (./tmp/). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
TRANSFORMERS_CACHE = ./tmp/
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (./tmp/). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.

@manzanofab
Copy link

@chenye-814 did you figure it out? i am having the same issue, There was a problem when trying to write in your cache folder (/documents). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
I already set the envirometn variable TRANSFORMERS_CACHE =documents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants