- Comparing Fine-tuning and Rewinding in Neural Network Pruning
- A Signal Propagation Perspective for Pruning Neural Networks at Initialization
- Data-Independent Neural Pruning via Coresets
- One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
- Lookahead: A Far-sighted Alternative of Magnitude-based Pruning
- Dynamic Model Pruning with Feedback
- Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware
- AutoQ: Automated Kernel-Wise Neural Network Quantization
- Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks
- Learned Step Size Quantization
- Sampling-Free Learning of Bayesian Quantized Neural Networks
- Gradient
$\ell_1$ Regularization for Quantization Robustness - BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
- Training binary neural networks with real-to-binary convolutions
- Critical initialisation in continuous approximations of binary neural networks
- Efficient and Effective Quantization for Sparse DNNs
- Focused Quantization for Sparse CNNs [paper]
- Point-Voxel CNN for Efficient 3D Deep Learning [paper]
- Model Compression with Adversarial Robustness: A Unified Optimization Framework [paper]
- MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization [paper] [codes]
- Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization [paper]
- Post-training 4-bit quantization of convolution networks for rapid-deployment
- Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations [paper]
- PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization [paper]
- AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters
- Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
- Global Sparse Momentum SGD for Pruning Very Deep Neural Networks [paper][codes]
- Channel Gating Neural Network
- Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks
- Positive-Unlabeled Compression on the Cloud [paper]
- Einconv: Exploring Unexplored Tensor Decompositions for Convolutional Neural Networks [paper] [codes]
- A Tensorized Transformer for Language Modeling [paper]
- Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices [paper]
- CondConv: Conditionally Parameterized Convolutions for Efficient Inference [paper]
- SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models [paper]
- Constrained deep neural network architecture search for IoT devices accounting hardware calibration [paper]
- DATA: Differentiable ArchiTecture Approximation [paper]
- Efficient Forward Architecture Search [paper]
- Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks [paper]
- E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings [paper]
- Backprop with Approximate Activations for Memory-efficient Network Training [paper]
- Dimension-Free Bounds for Low-Precision Training [paper]
- A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off [paper]
- Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss
- Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation
- Structured Pruning of Neural Networks with Budget-Aware Regularization
- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
- Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
- ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
- Cascaded Projection: End-to-End Network Compression and Acceleration
- Accelerating Convolutional Neural Networks via Activation Map Compression
- Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
- Factorized Convolutional Neural Networks
- Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression
- A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks
- Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?
- Cross Domain Model Compression by Structurally Weight Sharing
- Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
- Same, Same But Different-Recovering Neural Network Quantization Error Through Weight Factorization
- Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
- Variational inference for sparse network reconstruction from count data
- Collaborative Channel Pruning for Deep Networks
- Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets
- Minimal Random Code Learning: Getting Bits back from Compressed Model Parameters
- Scalable Methods for 8-bit Training of Neural Networks
- Heterogeneous Bitwidth Binarization in Convolutional Neural Networks
- HitNet: Hybrid Ternary Recurrent Neural Network
- WSNet: Compact and Efficient Networks Through Weight Sampling
- Espresso: Efficient Forward Propagation for BCNNs
- An Empirical study of Binary Neural Networks' Optimisation
- Learning Discrete Weights Using the Local Reparameterization Trick
- On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks
- Learning To Share: Simultaneous Parameter Tying and Sparsification in Deep Learning
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
- Value-aware Quantization for Training and Inference of Neural Networks
- LSQ++: Lower running time and higher recall in multi-codebook quantization
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- NISP: Pruning Networks using Neuron Importance Score Propagation
- SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks
- Improving Deep Neural Network Sparsity through Decorrelation Regularization
- Fixed Point Quantization of Deep Convolutional Networks
- Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights