Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSeek-V2: A Strong, Economical, and Efficient MoE LLM of 236B total parameters #831

Open
1 task
ShellLM opened this issue May 9, 2024 · 1 comment
Open
1 task
Labels
AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models base-model llm base models not finetuned for chat finetuning Tools for finetuning of LLMs e.g. SFT or RLHF llm Large Language Models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets Models LLM and ML model repos and links New-Label Choose this option if the existing labels are insufficient to describe the content accurately prompt Collection of llm prompts and notes software-engineering Best practice for software engineering

Comments

@ShellLM
Copy link
Collaborator

ShellLM commented May 9, 2024

DeepSeek-V2: A Strong, Economical, and Efficient MoE LLM of 236B total parameters

Snippet

Notes for DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 236B total parameters

Introduction

Today, we're introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.

We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.

Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively.

Evaluation Results

Base Model

Standard Benchmark

Benchmark Domain LLaMA3 70B Mixtral 8x22B DeepSeek-V1 (Dense-67B) DeepSeek-V2 (MoE-236B)
MMLU English 78.9 77.6 71.3 78.5
BBH English 81.0 78.9 68.7 78.9
C-Eval Chinese 67.5 58.6 66.1 81.7
CMMLU Chinese 69.3 60.0 70.8 84.0
HumanEval Code 48.2 53.1 45.1 48.8
MBPP Code 68.6 64.2 57.4 66.6
GSM8K Math 83.0 80.3 63.4 79.2
Math Math 42.2 42.5 18.7 43.6

For more evaluation details, such as few-shot settings and prompts, please check our paper.

Needle In A Haystack

Evaluation results on the Needle In A Haystack (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to 128K.

Chat Model

Standard Benchmark

Benchmark Domain QWen1.5 72B Chat Mixtral 8x22B LLaMA3 70B Instruct DeepSeek-V1 Chat (SFT) DeepSeek-V2 Chat (SFT) DeepSeek-V2 Chat (RL)
MMLU English 76.2 77.8 80.3 71.1 78.4 77.8
BBH English 65.9 78.4 80.1 71.7 81.3 79.7
C-Eval Chinese 82.2 60.0 67.9 65.2 80.9 78.0
CMMLU Chinese 82.9 61.0 70.7 67.8 82.4 81.6
HumanEval Code 68.9 75.0 76.2 73.8 76.8 81.1
MBPP Code 52.2 64.4 69.8 61.4 70.4 72.0
LiveCodeBench (0901-0401) Code 18.8 25.0 30.5 18.3 28.7 32.5
GSM8K Math 81.9 87.9 93.2 84.1 90.8 92.2
Math Math 40.6 49.8 48.5 32.6 52.7 53.9

MTBench

We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.

Coding Benchmarks

Code Benchmarks

We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.

Model Architecture

DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:

  • For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
  • For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.

Architecture

API Platform

We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com. Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price.

Model Price

The complete chat template can be found within tokenizer_config.json located in the huggingface model repository.

An example of chat template is as belows:

<|begin_of_sentence|>User: {user_message_1}

A: {assistant_message_1}<|end_of_sentence|>User: {user_message_2}

A:

You can also add an optional system message:

<|begin_of_sentence|>{system_message}

User: {user_message_1}

A: {assistant_message_1}<|end_of_sentence|>User: {user_message_2}

A:

Inference with vLLM (recommended)

To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: vllm-project/vllm#4650.

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 8
model_name = "deepseek-ai/DeepSeek-V2-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

License

This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to the Model License. DeepSeek-V2 series (including Base and Chat) supports commercial use.

Suggested labels

{'label-name': 'efficient-model-architecture', 'label-description': 'Description about the efficient architecture of DeepSeek-V2 model', 'confidence': 59.28}

@ShellLM ShellLM added AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models base-model llm base models not finetuned for chat finetuning Tools for finetuning of LLMs e.g. SFT or RLHF Models LLM and ML model repos and links New-Label Choose this option if the existing labels are insufficient to describe the content accurately prompt Collection of llm prompts and notes software-engineering Best practice for software engineering labels May 9, 2024
@ShellLM
Copy link
Collaborator Author

ShellLM commented May 9, 2024

Related content

#383 similarity score: 0.93
#726 similarity score: 0.91
#189 similarity score: 0.9
#628 similarity score: 0.9
#498 similarity score: 0.9
#324 similarity score: 0.88

@ShellLM ShellLM added llm Large Language Models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models base-model llm base models not finetuned for chat finetuning Tools for finetuning of LLMs e.g. SFT or RLHF llm Large Language Models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets Models LLM and ML model repos and links New-Label Choose this option if the existing labels are insufficient to describe the content accurately prompt Collection of llm prompts and notes software-engineering Best practice for software engineering
Projects
None yet
Development

No branches or pull requests

1 participant