- Position-based Scaled Gradient for Model Quantization and Sparse Training
- Robust Quantization: One Model to Rule Them All
- ConvBERT: Improving BERT with Span-based Dynamic Convolution
- FleXOR: Trainable Fractional Quantization
- Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
- BERT Loses Patience: Fast and Robust Inference with Early Exit
- Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
- Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks
- Up or Down? Adaptive Rounding for Post-Training Quantization
- Towards Accurate Post-training Network Quantization via Bit-Split and Stitching
- Differentiable Product Quantization for End-to-End Embedding Compression
- Multi-Precision Policy Enforced Training (MuPPET): A precision-switching strategy for quantised fixed-point training of CNNs
- Online Learned Continual Compression with Adaptive Quantization Modules
- Variational Bayesian Quantization
- Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks
- Training Binary Neural Networks through Learning with Noisy Supervision
- Training Binary Neural Networks using the Bayesian Learning Rule
- Adversarial Neural Pruning with Latent Vulnerability Suppression
- Operation-Aware Soft Channel Pruning using Differentiable Masks
- DropNet: Reducing Neural Network Complexity via Iterative Pruning
- Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
- Proving the Lottery Ticket Hypothesis: Pruning is All You Need
- PENNI: Pruned Kernel Sharing for Efficient CNN Inference
- GAN Compression: Efficient Architectures for Interactive Conditional GANs
- Structured Multi-Hashing for Model Compression
- Structured Compression by Weight Encryption for Unstructured Pruning and Quantization
- Training Quantized Neural Networks With a Full-Precision Auxiliary Module
- Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach
- Adaptive Loss-aware Quantization for Multi-bit Networks
- ZeroQ: A Novel Zero Shot Quantization Framework
- BiDet: An Efficient Binarized Object Detector
- Forward and Backward Information Retention for Accurate Binary Neural Networks
- Binarizing MobileNet via Evolution-Based Searching
- Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression
- Neural Network Pruning with Residual-Connections and Limited-Data
- HRank: Filter Pruning using High-Rank Feature Map
- DMCP: Differentiable Markov Channel Pruning for Neural Networks
- Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
- Discrete Model Compression with Resource Constraint for Deep Neural Networks
- Few Sample Knowledge Distillation for Efficient Network Compression
- The Knowledge Within: Methods for Data-Free Model Compression
- Low-rank Compression of Neural Nets: Learning the Rank of Each Layer
- APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
- Mixed Precision DNNs: All you need is a good parametrization
- Comparing Fine-tuning and Rewinding in Neural Network Pruning
- A Signal Propagation Perspective for Pruning Neural Networks at Initialization
- Data-Independent Neural Pruning via Coresets
- One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
- Lookahead: A Far-sighted Alternative of Magnitude-based Pruning
- Dynamic Model Pruning with Feedback
- Provable Filter Pruning for Efficient Neural Networks
- Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware
- AutoQ: Automated Kernel-Wise Neural Network Quantization
- Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks
- Learned Step Size Quantization
- Sampling-Free Learning of Bayesian Quantized Neural Networks
- Gradient
$\ell_1$ Regularization for Quantization Robustness - BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
- Training binary neural networks with real-to-binary convolutions
- Critical initialisation in continuous approximations of binary neural networks
- Mixed Precision DNNs: All you need is a good parametrization
- In Search for a SAT-friendly Binarized Neural Network Architecture