Skip to content

Commit

Permalink
updated README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
cahya-wirawan committed Sep 4, 2024
1 parent 1337ec2 commit 663493e
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ https://huggingface.co/datasets/cahya/simple-wikipedia/resolve/main/simple-wikip
## Tools using this tokenizer

We also created the [json2bin](https://github.com/cahya-wirawan/json2bin) application to convert datasets from JSONL format
into binidx format, a data format used for training RWKV models. It supports batch encoding with multithreading and
can convert a dataset more than 70 times faster than the original json2binidx program written in Python.
into binidx format, a data format used for training RWKV models. It uses multithreading to scale up the performance and
can convert a dataset more than 70 times faster (around 360 MB/s) than the original json2binidx program written in Python.

## Changelog
- Version 0.9.0
Expand Down

0 comments on commit 663493e

Please sign in to comment.