Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
-
Updated
Dec 18, 2024 - C++
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
LLaVA server (llama.cpp).
Runs LLaMA with Extremely HIGH speed
Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5
Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm
Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
Add a description, image, and links to the llama2 topic page so that developers can more easily learn about it.
To associate your repository with the llama2 topic, visit your repo's landing page and select "manage topics."