Skip to content

Latest commit

 

History

History
107 lines (81 loc) · 4.58 KB

File metadata and controls

107 lines (81 loc) · 4.58 KB

NeurIPS

  • Position-based Scaled Gradient for Model Quantization and Sparse Training

Quantization

  • Robust Quantization: One Model to Rule Them All
  • ConvBERT: Improving BERT with Span-based Dynamic Convolution
  • FleXOR: Trainable Fractional Quantization

Pruning

Structure Pruning

  • Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
  • BERT Loses Patience: Fast and Robust Inference with Early Exit

ICML

  • Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Quantization

  • Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks
  • Up or Down? Adaptive Rounding for Post-Training Quantization
  • Towards Accurate Post-training Network Quantization via Bit-Split and Stitching
  • Differentiable Product Quantization for End-to-End Embedding Compression
  • Multi-Precision Policy Enforced Training (MuPPET): A precision-switching strategy for quantised fixed-point training of CNNs
  • Online Learned Continual Compression with Adaptive Quantization Modules
  • Variational Bayesian Quantization
  • Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks
  • Training Binary Neural Networks through Learning with Noisy Supervision
  • Training Binary Neural Networks using the Bayesian Learning Rule

Pruning

  • Adversarial Neural Pruning with Latent Vulnerability Suppression
  • Operation-Aware Soft Channel Pruning using Differentiable Masks
  • DropNet: Reducing Neural Network Complexity via Iterative Pruning
  • Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
  • Proving the Lottery Ticket Hypothesis: Pruning is All You Need
  • PENNI: Pruned Kernel Sharing for Efficient CNN Inference

CVPR

  • GAN Compression: Efficient Architectures for Interactive Conditional GANs
  • Structured Multi-Hashing for Model Compression

Quantization

  • Structured Compression by Weight Encryption for Unstructured Pruning and Quantization
  • Training Quantized Neural Networks With a Full-Precision Auxiliary Module
  • Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach
  • Adaptive Loss-aware Quantization for Multi-bit Networks
  • ZeroQ: A Novel Zero Shot Quantization Framework
  • BiDet: An Efficient Binarized Object Detector
  • Forward and Backward Information Retention for Accurate Binary Neural Networks
  • Binarizing MobileNet via Evolution-Based Searching

Pruning

Structure Pruning

  • Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression
  • Neural Network Pruning with Residual-Connections and Limited-Data
  • HRank: Filter Pruning using High-Rank Feature Map
  • DMCP: Differentiable Markov Channel Pruning for Neural Networks
  • Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
  • Discrete Model Compression with Resource Constraint for Deep Neural Networks

Distillation

  • Few Sample Knowledge Distillation for Efficient Network Compression
  • The Knowledge Within: Methods for Data-Free Model Compression

Low-Rank Approximation

  • Low-rank Compression of Neural Nets: Learning the Rank of Each Layer

NAS

  • APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

ICLR

  • Mixed Precision DNNs: All you need is a good parametrization

Pruning

  • Comparing Fine-tuning and Rewinding in Neural Network Pruning
  • A Signal Propagation Perspective for Pruning Neural Networks at Initialization
  • Data-Independent Neural Pruning via Coresets
  • One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
  • Lookahead: A Far-sighted Alternative of Magnitude-based Pruning
  • Dynamic Model Pruning with Feedback

Structure Pruning

  • Provable Filter Pruning for Efficient Neural Networks

Quantization

  • Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware
  • AutoQ: Automated Kernel-Wise Neural Network Quantization
  • Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks
  • Learned Step Size Quantization
  • Sampling-Free Learning of Bayesian Quantized Neural Networks
  • Gradient $\ell_1$ Regularization for Quantization Robustness
  • BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
  • Training binary neural networks with real-to-binary convolutions
  • Critical initialisation in continuous approximations of binary neural networks
  • Mixed Precision DNNs: All you need is a good parametrization

NAS

  • In Search for a SAT-friendly Binarized Neural Network Architecture