-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor fixes for up llama load model speed #11448
Conversation
…e_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
Changes Overview Original:
std::make_unique is the modern, preferred way to create std::unique_ptr instances. It ensures exception safety by preventing memory leaks if an exception occurs during object construction. The behavior of the code remains identical, as std::make_unique internally performs the same memory allocation and object construction as new. Replaced std::map with std::unordered_map in llama-vocab.cpp: Original:
Performance Improvement: std::unordered_map provides O(1) average time complexity for insertions and lookups, compared to O(log n) for std::map. This is particularly beneficial for large datasets, as it reduces the overhead of maintaining a balanced tree. Correctness: A custom hash function (PairHash) was implemented to ensure that std::pair<std::string, std::string> can be used as a key in std::unordered_map. The hash function combines the hashes of the two strings using XOR, which is a common and efficient approach. Behavioral Consistency: The logic of the code remains unchanged, as std::unordered_map and std::map provide the same interface for insertion and lookup. |
I’ve been studying ggml-rpc for potential optimizations (@rgerganov might have suggestions on optimal development vectors for RPC without breaking the current architecture, as I don’t want to change it and later justify that the architecture is okay). I haven’t do some optimization rpc, but I’ve had more luck with load optimization — saved ~20% time (excluding RPC) for a pull request. |
src/llama-vocab.cpp
Outdated
struct PairHash { | ||
size_t operator()(const std::pair<std::string, std::string>& p) const { | ||
return std::hash<std::string>{}(p.first) ^ //create some hash for pair | ||
(std::hash<std::string>{}(p.second) << 1); | ||
} | ||
}; | ||
std::unordered_map<std::pair<std::string, std::string>, int, PairHash> bpe_ranks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there is a hash collision of 2 string pairs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there is a hash collision of 2 string pairs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a custom comparator for reliability, but std::unordered_map uses the same comparator (std::pair is comparable) as std::map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparator in unordered_map use only equals, but in map used equals, <, > (tree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see. I was a bit confused how this works. It should be OK.
Minor fixes for up llama load speed (20% tottaly, without counting rpc timespent)