New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

speculative decoding in llama.cpp : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp #492

Open

1 task

irthomasthomas opened this issue Feb 1, 2024 · 0 comments

Labels

Algorithms llm-experiments llm-serving-optimisations prompt-engineering TIL

Owner

irthomasthomas commented Feb 1, 2024

speculative : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp

Title: speculative : PoC for speeding-up inference via speculative sampling #292

Suggested labels

{ "label-name": "LLM-speed-optimization", "description": "Optimizing LLama model inference speed", "confidence": 80.85 }

irthomasthomas added Algorithms llm-experiments llm-serving-optimisations New-Label prompt-engineering TIL labels

irthomasthomas changed the title ~~speculative : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp~~ speculative decoding in llama.cpp : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp

irthomasthomas removed the New-Label label

irthomasthomas mentioned this issue

self-speculative-decoding/README.md at main · dilab-zju/self-speculative-decoding #680

Open

1 task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment