A collection of papers that try to explain the mysteries of deep learning with theories and empirical evidences. And here is a curated resource of deep learning theory papers by Prof. Boris Hanin at Princeton.
- The Principles of Deep Learning Theory, Jun. 18 2021.
- Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate, NeurIPS 2018.
- The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning, ICML 2018.
- Training Neural Networks for and by Interpolation, ICML 2019.
- Learning in High Dimension Always Amounts to Extrapolation, Oct. 18 2021, Yann LeCun et al.
- Benign, Tempered, or Catastrophic: A Taxonomy of Overftting, Jul. 14 2022.
- Benign Overfitting for Two-layer ReLU Networks, Mar. 7 2023.
- On Margin Maximization in Linear and ReLU Networks, Nathan Srebro's group, 2021.
- The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks, 2021, Google Research.
- On the Implicit Biases of Architecture & Gradient Descent, 2021, Yisong Yue's group.
- "This paper finds that while typical networks that fit the training data already generalise fairly well, gradient descent can further improve generalisation by selecting networks with a large margin."
- The Deep Bootstrap Framework: Good Online Learners Are Good Offline Generalizers, Feb. 2021.
iclr2021
- No One Representation to Rule Them All: Overlapping Features of Training Methods, Oct. 26 2021.
- Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias, Oct. 26. 2021.
- Diversity and Generalization in Neural Network Ensembles, Oct. 6 2021.
- The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective, Jun. 11 2021.
- How Tight Can PAC-Bayes be in the Small Data Regime?, Oct. 27 2021.
- Towards a Unified Information-Theoretic Framework for Generalization, Nov. 9 2021.
nips2021
Daniel Roy's group.non-vacuous generalization bound
SGD, loss landscape, learning dynamics, stochacity, sgd for feature learning, learning curriculum etc.
- Don't Decay the Learning Rate, Increase the Batch Size, Nov. 2017.
iclr2018
. - Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss, COLT 2020.
- Stochastic Training is Not Necessary for Generalization, Tom Goldstein's group,
nips2021
. - Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect, Tuo Zhao's group.
- Momentum Doesn't Change The Implicit Bias.
- On the Implicit Biases of Architecture & Gradient Descent, 2021, Yisong Yue's group,
implicit bias
ofgd
- Parameter Prediction for Unseen Deep Architectures, Oct. 25 2021.
- Gradient Starvation: A Learning Proclivity in Neural Networks, Oct. 26 2021.
nips2021
- What training reveals about neural network complexity, Oct. 29 2021.
- A Loss Curvature Perspective on Training Instabilities of Deep Learning Models,
iclr2022 submit
- Permutation-Based SGD: Is Random Optimal?,
iclr2022 submit
- A General Analysis of Example-Selection for Stochastic Gradient Descent,
iclr2022 submit
- How many degrees of freedom do we need to train deep networks: a loss landscape perspective, Jul. 13 2021.
- The Benefits of Implicit Regularization from SGD in Least Squares Problems, Aug. 10 2021
nips2021
- The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion, Dec. 2 2022.
- Understanding Gradient Descent on Edge of Stability in Deep Learning, May 22 2022.
- Understanding Edge-of-Stability Training Dynamics with a Minimalist Example, Oct. 7 2022.
- Neural Networks can Learn Representations with Gradient Descent, Jun. 30 2022.
colt2022
- Git Re-Basin: Merging Models modulo Permutation Symmetries, Sep. 11 2022. tweet1, tweet2, tweet3.
- The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima, Oct. 4 2022.
- From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent, Oct. 13 2022.
- Grokking phase transitions in learning local rules with gradient descent, Oct. 26 2022.
- High-dimensional Asympototics of Feature Learning: How One Gradient Step Improves the Representation, Jimmy Ba et al. arXiv May 3 2022.
- Exact learning dynamics of deep linear networks with prior knowledge,
nips2022
.learning dynamics
. - Handbook of Convergence Theorems for (Stochastic) Gradient Methods, Jan. 26 2023.
- Learning sparse features can lead to overfitting in neural networks, Jun. 24 2022.
- Limitations of the NTK for Understanding Generalization in Deep Learning, Jun. 20 2022.
- The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks, Oct. 5 2022.
- Does Knowledge Distillation Really Work?, Jun. 10 2021.
nips2021
- Understanding Why Generalized Reweighting Does Not Improve Over ERM, Jan. 28 2022.
- Limitation of characterizing implicit regularization by data-independent functions, Jan. 28 2022.
- Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data, Oct. 2022.
- Neural Networks Efficiently Learn Low-Dimensional Representations with SGD, Sep. 29 2022.
sgd
- Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?, Jun. 13 2022.
- Feature learning in neural networks and kernel machines that recursively learn features, Dec. 28 2022.
- From Mikhail Belkin's group.
- Exploring the Limits of Large Scale Pre-training, Google Research.
- The Power of Contrast for Feature Learning: A Theoretical Analysis, Oct. 2021. James Zou's group.
- Sharp Learning Bounds for Contrastive Unsupervised Representation Learning, Oct. 2021. RIKEN AIP.
- Can contrastive learning avoid shortcut solutions?, Jun. 21 2021. MIT and Pittsburg univ.
- Intriguing Properties of Contrastive Losses, Oct. 23 2021. Google Research.
- Stochastic Contrastive Learning, Oct. 2021.
interpretability
- How Does Contrastive Pre-training Connect Disparate Domains?,
nipst2021
- Contrastive Learning Can Find An Optimal Basis for Approximately View-Invariant Functions, arXiv Oct. 4 2022.
- Do More Negative Samples Necessarily Hurt In Contrastive Learning?, Jun. 22 2022.
icml2022
. - Understanding Deep Contrastive Learning via Coordinate-wise Optimization,
nips2022
. - Understanding Contrastive Learning Requires Incorporating Inductive Biases, Feb. 28 2022.
- Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning, Dec. 15 2022.
- Emergence of Invariance and Disentanglement in Deep Representations,
jmlr2018
- Grounding Representation Similarity with Statistical Testing, Nov. 3 2021.
representation comparison
- Revisiting Model Stitching to Compare Neural Representations, Jun. 14 2021.
representation comparison
- Comparing Text Representations: A Theory-Driven Approach, Sep. 2021.
sentence embedding
- Discovering and Explaining The Representation Bottleneck of DNNs,
iclr2022 submit
- A theory of representation learning in deep neural networks gives a deep generalisation of kernel methods, Apr. 23 2023.
icml2023
.
- Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle, Mar. 24 2023.
- An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers, Oct. 18 2022.
- Neural Tangent Kernel Eigenvalues Accurately Predict Generalization, UCB,
nips2021
spotlight. - Predicting Unreliable Predictions by Shattering a Neural Network, 2021, Yoshua Bengio's group.
- On Predicting Generalization using GANs, Nov. 28 2021.
- Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks, Nov. 25 2021.
- On the Maximum Hessian Eigenvalue and Generalization, Jun. 22 2022.
- Can You Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective, Dec. 16 2021.
- Deep Learning Through the Lens of Example Difficulty, Google Research 2021.
- Deep Learning on a Data Diet: Finding Important Examples Early in Training, Jul. 15 2021.
nips2021
See here for the detailed discussion on spurious correlation.
- Can You Win Everything with A Lottery Ticket?, TMLR 2022.
- Network size and weights size for memorization with two-layers neural networks, Nov. 3 2020.
- What Do Neural Networks Learn When Trained With Random Labels?,
nips2020
. - Neural Networks Learning and Memorization with (almost) no Over-Parameterization,
nips2020
. - On the geometry of generalization and memorization in deep neural networks,
iclr2021
. - The Curious Case of Benign Memorization, Oct. 25 2022.
- "only the very last layers are used for memorization, while preceding layers encode performant features which remain largely unaffected by the label noise"
- Distinguishing rule and exemplar-based generalization in learning systems,
icml2022
.- The experiment setting has been applied to study in-context ability of Transformers tweet.
- Unintended memorisation of unique features in neural networks, May 20 2022.