What's Changed
⚡ Auto-fix models not setting padding_token
⚡ Auto-Fix models released with wrong padding_token: many models incorrectly use eos_token as pad_token which leads to subtle and hidden errors in post-training and inference when batching is used which is almost always.
⚡ Compatible with all HF Transformers recognized tokenizers
- Auto fix pad token by @CL-ModelCloud in #5
- Forward to Tokenizer by @CL-ModelCloud in #6
- read requirements.txt in setup.py by @CSY-ModelCloud in #7
- [CI] add tokenicer forward test by @CL-ModelCloud in #10
- add unit tests by @CSY-ModelCloud in #11
- refractor by @Qubitium in #8
- add deepseek_v3 map by @CL-ModelCloud in #15
New Contributors
- @CSY-ModelCloud made their first contribution in #1
- @Qubitium made their first contribution in #3
- @CL-ModelCloud made their first contribution in #5
Full Changelog: https://github.com/ModelCloud/Tokenicer/commits/v0.0.2