Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BPE cache to improve efficiency of BPE Tokenizers #590

Closed
wants to merge 3 commits into from

Conversation

sayanshaw24
Copy link
Contributor

@sayanshaw24 sayanshaw24 requested a review from a team as a code owner November 2, 2023 22:07
Copy link
Member

@wenbingl wenbingl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to measure the perf more since there are some extra efforts around bpe method now.

std::list<std::string> dq;

// Store references of keys in cache for efficiency
std::unordered_map<std::string, std::list<std::string>::iterator> references;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the map unordered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because that one is just for the iterators for the keys (address of keys). dq is what we are using as a queue for the Queue+Map LRU implementation.

// We use a LRU cache algorithm for the same in C++ in order to save compute.

// Current cache capacity is set to a relatively small 500 in order to support mobile platforms.
LRUCache bpe_cache = LRUCache(500);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be used for next Compute call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - if we move it outside to use it for the next Compute call it throws errors because the type of BPE tokenizer (GPT2/CLIP/Roberta) is different on different calls and the tests fail because ids/offsets are wrong.

@sayanshaw24
Copy link
Contributor Author

Need to measure the perf more since there are some extra efforts around bpe method now.

Added an average tokenization time in microseconds using std::chrono that is currently printed out in Compute after completing all calls to Tokenize. In the future, we can refactor this to make it cleaner and potentially even add an optional output for it.

// We use a LRU cache algorithm for the same in C++ in order to save compute.

// Current cache capacity is set to a relatively small 500 in order to support mobile platforms.
LRUCache bpe_cache = LRUCache(500);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a flag to enable/disable cache feature

@@ -26,6 +27,10 @@ const char BpeModelConf::kModel_GPT2[] = "GPT2";
const char BpeModelConf::kModel_Roberta[] = "Roberta";
const char BpeModelConf::kModel_CLIP[] = "CLIP";

// We specifically measure performance for tokenization as our BPE implementation includes a number of optimizations
long long total_tokenization_time = 0.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these time measurement code should be removed.


class LRUCache {
// Store keys of cache
std::list<std::string> dq;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given it's a limited capacity cache, the container can be a vector here.

@@ -230,3 +230,70 @@ class TokenWithRegularExp {
private:
std::u32string_view m_text;
};

class LRUCache {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you find some high-quality C++ LRUCahe in GitHub for a reference?

@wenbingl wenbingl deleted the sayanshaw/bpe-cache branch February 8, 2024 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants