speculative decoding in llama.cpp : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp #492
Labels
Algorithms
Sorting, Learning or Classifying. All algorithms go here.
llm-experiments
experiments with large language models
llm-serving-optimisations
Tips, tricks and tools to speedup inference of large language models
prompt-engineering
Developing and optimizing prompts to efficiently use language models for various applications and re
TIL
Short notes or tips on coding, linux, llms, ml, etc
Title: speculative : PoC for speeding-up inference via speculative sampling #292
Suggested labels
{ "label-name": "LLM-speed-optimization", "description": "Optimizing LLama model inference speed", "confidence": 80.85 }
The text was updated successfully, but these errors were encountered: