Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for local instances of llama 3 and changes to support reusing trie and homomorphism #74

Merged
merged 6 commits into from
Aug 27, 2024

Conversation

MatthewChang
Copy link
Contributor

Here are some changes I've made for a project I'm working on. If you like any of them I can split out those changes to merge in.

  1. Change the name checking for llama3 models to be looser so that local instances of the weights (specified by a path) or finetuned versions of the models can be picked up properly.

  2. Support for specifying the device to use for tensors during the masking process. I needed this because the LRU cache was holding GPU memory indefinitely. I could have also just moved scores to CPU before calling the masking function or shrunk the cache size so this is non-essential. Perhaps the cache size should be customizable.

  3. Added support for passing in the trie and homomorphism. In my application we have a server serving many requests and the overhead of rebuilding the trie each time is unnecessary. This allows you to reuse some of the expensive operations on initialization.

@Saibo-creator Saibo-creator self-requested a review August 2, 2024 18:26
@Saibo-creator
Copy link
Collaborator

Cool! Thanks for the contribution @MatthewChang . They look good to me. I will double-check and merge them in the following days. ❤️

@Saibo-creator Saibo-creator merged commit cc4fb58 into epfl-dlab:main Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants