Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Include HF files such as config.json etc part of recovered checkpoint #132

Open
kmehant opened this issue Feb 27, 2025 · 4 comments
Open

Comments

@kmehant
Copy link
Collaborator

kmehant commented Feb 27, 2025

Should we copy the following set of files from the source/base checkpoint to the recovered checkpoint? So that the downstream users (eval) can easily consume the checkpoint upon recovery?

  1. config.json
  2. generation_config.json
  3. merges.txt
  4. special_tokens_map.json
  5. tokenizer.json
  6. tokenizer_config.json
  7. vocab.json
@kmehant
Copy link
Collaborator Author

kmehant commented Feb 27, 2025

@fabianlim @willmj

@fabianlim
Copy link
Contributor

im not sure this is needed because it can be easily scripted

@kmehant
Copy link
Collaborator Author

kmehant commented Feb 27, 2025

We should decide on automating this somewhere may be in fms-hf-tuning controllable through a flag. We can save both dct and hf format safetensor checkpoints together. Since users were expecting a consumable model for the checkpoints flused out by our stack. WDYT?

@fabianlim
Copy link
Contributor

@willmj maybe you can consult @Ssukriti or @anhuong on this since you guys are more familiar with the users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants