mla

An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.

pytorch attention-mechanism mla multi-head-attention llm deepseek

Updated Apr 10, 2025
Python

scar-ai / The-Latentformer

Star

Latentformer is a transformer model with latent attention designed for efficient training. It features learnable positional embeddings, rotary position encoding, and MLA to optimize speed and performance while maintaining model quality.

training machine-learning ai transformer mla llm

Updated Apr 5, 2025
Python

LemonAttn / mini_transformer

Star

最小Transformer架构，能够快速搭建现在各种Transformer架构模型

nlp cv artificial-intelligence transformer moe attention-mechanism mla deepseek

Updated Apr 11, 2025
Python

Improve this page

Add a description, image, and links to the mla topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mla topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mla

Here are 10 public repositories matching this topic...

xlite-dev / Awesome-LLM-Inference

fxmeng / TransMLA

hahnec / plenopticam

shadowpa0327 / Palu

hahnec / plenoptisign

abdelfattah-lab / xKV

haukzero / from-mha-to-mla

junfanz1 / MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention

scar-ai / The-Latentformer

LemonAttn / mini_transformer

Improve this page

Add this topic to your repo