DeepSeek-V2: A Strong, Economical, and Efficient MoE LLM of 236B total parameters #831
Labels
AI-Chatbots
Topics related to advanced chatbot platforms integrating multiple AI models
base-model
llm base models not finetuned for chat
finetuning
Tools for finetuning of LLMs e.g. SFT or RLHF
llm
Large Language Models
llm-evaluation
Evaluating Large Language Models performance and behavior through human-written evaluation sets
Models
LLM and ML model repos and links
New-Label
Choose this option if the existing labels are insufficient to describe the content accurately
prompt
Collection of llm prompts and notes
software-engineering
Best practice for software engineering
DeepSeek-V2: A Strong, Economical, and Efficient MoE LLM of 236B total parameters
Snippet
Notes for DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 236B total parameters
Introduction
Today, we're introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively.
Evaluation Results
Base Model
Standard Benchmark
For more evaluation details, such as few-shot settings and prompts, please check our paper.
Evaluation results on the Needle In A Haystack (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to 128K.
Chat Model
Standard Benchmark
We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
Coding Benchmarks
We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
Model Architecture
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
API Platform
We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com. Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price.
The complete chat template can be found within
tokenizer_config.json
located in the huggingface model repository.An example of chat template is as belows:
You can also add an optional system message:
Inference with vLLM (recommended)
To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: vllm-project/vllm#4650.
License
This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to the Model License. DeepSeek-V2 series (including Base and Chat) supports commercial use.
Suggested labels
{'label-name': 'efficient-model-architecture', 'label-description': 'Description about the efficient architecture of DeepSeek-V2 model', 'confidence': 59.28}
The text was updated successfully, but these errors were encountered: