-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] DeepSeek V2 Chat Support #48
Comments
@LRL-ModelCloud has been assigned to this task. Model has been downloaded and work should be completed soon. |
Can you provide a quantified model for DeepSeek V2 Chat? I encountered an OOM error during the quantization process |
@Xu-Chen What gpu model did you use for deepseek v2 quant? I want to check if the oom is code related or just because deepseek v2 is a little special and requires more vram. |
quant code
Is it not possible to use the GPU to load the model? GPU: 8 * A800-80GB |
delete max_memory=max_memory can run. Is there a way to use the GPU to load the model and then perform parallel quantization to improve the quantization speed? |
Remove all options and use just the base. GPTQModel will select the best dtype and accelerate will auto handle model weight splits. model = GPTQModel.from_pretrained(
args.model_id,
quantize_config,
) |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
AutoGPTQ/AutoGPTQ#664
The text was updated successfully, but these errors were encountered: