Release Toke(n)icer v0.0.2 · ModelCloud/Tokenicer

What's Changed

⚡ Auto-fix models not setting padding_token
⚡ Auto-Fix models released with wrong padding_token: many models incorrectly use eos_token as pad_token which leads to subtle and hidden errors in post-training and inference when batching is used which is almost always.
⚡ Compatible with all HF Transformers recognized tokenizers

Auto fix pad token by @CL-ModelCloud in #5
Forward to Tokenizer by @CL-ModelCloud in #6
read requirements.txt in setup.py by @CSY-ModelCloud in #7
[CI] add tokenicer forward test by @CL-ModelCloud in #10
add unit tests by @CSY-ModelCloud in #11
refractor by @Qubitium in #8
add deepseek_v3 map by @CL-ModelCloud in #15

New Contributors

@CSY-ModelCloud made their first contribution in #1
@Qubitium made their first contribution in #3
@CL-ModelCloud made their first contribution in #5

Full Changelog: https://github.com/ModelCloud/Tokenicer/commits/v0.0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toke(n)icer v0.0.2

What's Changed

New Contributors

Contributors