Loading weights from an input stream #11307

silcowitz · 2025-01-20T10:44:45Z

silcowitz
Jan 20, 2025

Im interested in finding a way to load the model weights onto the gpus while the weight data is still being streamed/downloaded from a storage service like s3. This is to avoid waiting to download the weights first and only then begin to load the weights (basically starting llama.cpp), which both take considerable time on cloud VMs.

I searched around but haven't really found anything that addresses this particular issue.

What would be the best way to accomplish this? Can I write some code that lobs the data to llama.cpp as it comes in from the stream or is there a much simpler way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading weights from an input stream #11307

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Loading weights from an input stream #11307

silcowitz Jan 20, 2025

Replies: 0 comments

silcowitz
Jan 20, 2025