Awesome-Deep-Neural-Network-Compression/Paper/NLP-Compression.md at master · Vectorly/Awesome-Deep-Neural-Network-Compression · GitHub

RNN

HitNet: Hybrid Ternary Recurrent Neural Network
Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition

BERT

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Quantization

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q8BERT: Quantized 8Bit BERT

Pruning

Reducing Transformer Depth on Demand with Structured Dropout
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Are Sixteen Heads Really Better than One?
Structured Pruning of Large Language Models
Pruning a BERT-based Question Answering Model
DynaBERT: Dynamic BERT with Adaptive Width and Depth

Distillation

Another Summary
TinyBERT: Distilling BERT for Natural Language Understanding
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Tensorization

A Tensorized Transformer for Language Modeling
Low-Rank Bottleneck in Multi-head Attention Models

Comprehensive Study

Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Understanding and Visualization

What Does BERT Look at? An Analysis of BERT’s Attention
Visualizing and understanding neural machine translation
An Analysis of Encoder Representations in Transformer-Based Machine Translation