Minor fixes for up llama load model speed #11448

lexasub · 2025-01-27T11:19:29Z

Minor fixes for up llama load speed (20% tottaly, without counting rpc timespent)

…mpl::load on 30%

…e_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings

lexasub · 2025-01-27T11:25:05Z

Changes Overview
Replaced new with std::make_unique in llama-model-loader.cpp:

Original:

std::unique_ptr<llama_mmap> mapping(new llama_mmap(file.get(), prefetch ? -1 : 0, is_numa_fn()));
Updated:

std::unique_ptr<llama_mmap> mapping = std::make_unique<llama_mmap>(file.get(), prefetch ? -1 : 0, is_numa_fn());
Why this is safe and beneficial:

std::make_unique is the modern, preferred way to create std::unique_ptr instances.

It ensures exception safety by preventing memory leaks if an exception occurs during object construction.

The behavior of the code remains identical, as std::make_unique internally performs the same memory allocation and object construction as new.

Replaced std::map with std::unordered_map in llama-vocab.cpp:

Original:

std::map<std::pair<std::string, std::string>, int> bpe_ranks;
Updated:

struct PairHash { size_t operator()(const std::pair<std::string, std::string>& p) const { return std::hash<std::string>{}(p.first) ^ (std::hash<std::string>{}(p.second) << 1); } }; std::unordered_map<std::pair<std::string, std::string>, int, PairHash> bpe_ranks;
Why this is safe and beneficial:

Performance Improvement:

std::unordered_map provides O(1) average time complexity for insertions and lookups, compared to O(log n) for std::map.

This is particularly beneficial for large datasets, as it reduces the overhead of maintaining a balanced tree.

Correctness:

A custom hash function (PairHash) was implemented to ensure that std::pair<std::string, std::string> can be used as a key in std::unordered_map.

The hash function combines the hashes of the two strings using XOR, which is a common and efficient approach.

Behavioral Consistency:

The logic of the code remains unchanged, as std::unordered_map and std::map provide the same interface for insertion and lookup.

lexasub · 2025-01-27T11:49:30Z

I’ve been studying ggml-rpc for potential optimizations (@rgerganov might have suggestions on optimal development vectors for RPC without breaking the current architecture, as I don’t want to change it and later justify that the architecture is okay). I haven’t do some optimization rpc, but I’ve had more luck with load optimization — saved ~20% time (excluding RPC) for a pull request.

ggerganov · 2025-01-27T11:58:05Z

src/llama-vocab.cpp

+    struct PairHash {
+        size_t operator()(const std::pair<std::string, std::string>& p) const {
+            return std::hash<std::string>{}(p.first) ^  //create some hash for pair
+                   (std::hash<std::string>{}(p.second) << 1);
+        }
+    };
+    std::unordered_map<std::pair<std::string, std::string>, int, PairHash> bpe_ranks;


What if there is a hash collision of 2 string pairs?

What if there is a hash collision of 2 string pairs?

https://en.cppreference.com/w/cpp/container/unordered_map#:~:text=Internally%2C%20the%20elements,for%20both%20keys.

We can use a custom comparator for reliability, but std::unordered_map uses the same comparator (std::pair is comparable) as std::map.

comparator in unordered_map use only equals, but in map used equals, <, > (tree)

Yes, I see. I was a bit confused how this works. It should be OK.

src/llama-vocab.cpp

lexasub added 2 commits January 27, 2025 15:11

impl::load change map bpe_ranks to onordered map for reduce time of i…

5144a18

…mpl::load on 30%

llama_model_loader::init_mapping - replace new llama_mmap to std::mak…

7c2b924

…e_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings

lexasub changed the title ~~Minor fixes for up llama load speed~~ Minor fixes for up llama load model speed Jan 27, 2025

ggerganov reviewed Jan 27, 2025

View reviewed changes

ggerganov approved these changes Jan 27, 2025

View reviewed changes

slaren reviewed Jan 27, 2025

View reviewed changes

src/llama-vocab.cpp Outdated Show resolved Hide resolved

Update src/llama-vocab.cpp

723fc66

slaren approved these changes Jan 27, 2025

View reviewed changes

slaren merged commit a5203b4 into ggerganov:master Jan 27, 2025
45 checks passed

Animaxx added a commit to Animaxx/llama.cpp that referenced this pull request Jan 28, 2025

https://github.com/ggerganov/llama.cpp/pull/11448

a2400dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor fixes for up llama load model speed #11448

Minor fixes for up llama load model speed #11448

lexasub commented Jan 27, 2025

lexasub commented Jan 27, 2025 •

edited

Loading

lexasub commented Jan 27, 2025

ggerganov Jan 27, 2025

lexasub Jan 27, 2025

lexasub Jan 27, 2025 •

edited

Loading

lexasub Jan 27, 2025

ggerganov Jan 27, 2025

Minor fixes for up llama load model speed #11448

Minor fixes for up llama load model speed #11448

Conversation

lexasub commented Jan 27, 2025

lexasub commented Jan 27, 2025 • edited Loading

lexasub commented Jan 27, 2025

ggerganov Jan 27, 2025

Choose a reason for hiding this comment

lexasub Jan 27, 2025

Choose a reason for hiding this comment

lexasub Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

lexasub Jan 27, 2025

Choose a reason for hiding this comment

ggerganov Jan 27, 2025

Choose a reason for hiding this comment

lexasub commented Jan 27, 2025 •

edited

Loading

lexasub Jan 27, 2025 •

edited

Loading