Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During usage of https://github.com/tiktoken-go/tokenizer I noticed that there was extensive usage of regexp2, which this library allows me to pre-compile.
I ran into some issues with the tooling, e.g. the
go install
mentioned won't work as the only tagged version is an older one which still refers to areplace
in thego.mod
. I also ran into some other issues as I was locally debugging, these were the small code changes I ended up with.For instance the tokenizer library initialized the MustCompile inside a struct which this library doesn't (yet) support. Then on top it had concatenation for strings... so it took quite some effort to get it to work.
I really like this tooling and would love to see it get more traction as regexp are notoriously slow in Go.
PS: parts of the code was generated with ChatGPT but have been confirmed and tested by myself, just modern lazy evening programming ;)