You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
State-of-the-art MoE language model (671B params, 37B activated) with Multi-head Latent Attention. Pre-trained on 14.8T tokens, featuring 128K context length, and achieving SOTA performance in reasoning, math, and code tasks. Supports efficient deployment across NVIDIA/AMD GPUs.