Support for local instances of llama 3 and changes to support reusing trie and homomorphism #74

MatthewChang · 2024-07-30T00:54:17Z

Here are some changes I've made for a project I'm working on. If you like any of them I can split out those changes to merge in.

Change the name checking for llama3 models to be looser so that local instances of the weights (specified by a path) or finetuned versions of the models can be picked up properly.
Support for specifying the device to use for tensors during the masking process. I needed this because the LRU cache was holding GPU memory indefinitely. I could have also just moved scores to CPU before calling the masking function or shrunk the cache size so this is non-essential. Perhaps the cache size should be customizable.
Added support for passing in the trie and homomorphism. In my application we have a server serving many requests and the overhead of rebuilding the trie each time is unnecessary. This allows you to reuse some of the expensive operations on initialization.

Saibo-creator · 2024-08-02T18:40:39Z

Cool! Thanks for the contribution @MatthewChang . They look good to me. I will double-check and merge them in the following days. ❤️

A Meta HPC user for matthewchang and others added 6 commits July 10, 2024 03:10

check for llama 3 name anywhere in path

e8ee94a

add function for grammar switching

1be90cc

add support for resuing trie

9ae6fb6

add support for reusing trie and homomorphism

af0a10c

add support for custom device to fix memory leak

dbc6d24

clean up commits

e1414e2

Saibo-creator self-requested a review August 2, 2024 18:26

Saibo-creator approved these changes Aug 27, 2024

View reviewed changes

Saibo-creator merged commit cc4fb58 into epfl-dlab:main Aug 27, 2024

Saibo-creator mentioned this pull request Aug 27, 2024

look into the memory leak #87

Open

Provide feedback