This repository has been archived by the owner on Jan 9, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(text): add tokenizer for cohere & new gpt-4o (#276)
Because - users will need to count the token count for each chunk - token chunk strategy and token count should decouple - users will need to fetch tokenisor from vendors This commit - add tokenization for cohere & gpt-4o family - p.s. there are more use cases in huggingface. it will cause error when the setting is not correct for huggingface python lib. - e.g. the token count is over the limitation for a specific model - refactor text task chunk text for future extensibility and maintainability
- Loading branch information
1 parent
15fc0d2
commit 5d8cec3
Showing
16 changed files
with
969 additions
and
216 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.