We use this repository to keep track of slides that we are making for a theoretical review on neural network based models.
- A Princeton course on deep learning theory: https://github.com/leiwu1990/course.math_theory_nn
- Slides for a summer school on Deep Learning at SJTU 2020: folder "dl summer school 2020"
- Telgarsky's Note on deep learning theory: https://mjt.cs.illinois.edu/dlt/#vc-dimension-of-linear-predictors
This folder contains notes we made and presented during lab meetings. Below are their brief descriptions:
- Nov30.2020_1, Nov30.2020_1 : We discussed the high level ideas of three papers (Hieber, Chen et al, Arora et al) and made comparisons.
The following is a list of papers that we are working on presentatoin slides.
- The PDF files of the corresponding papers are in folder "papers".
- The corresponding Latex sources are in folder "slides source files".
- Nonparametric regression using deep neural networks with ReLU activation function; J Schmidt-Hieber - arXiv preprint arXiv:1708.06633, 2017
- papers/1708.06633.pdf
- slides source files/Hieber_approx.xxx for the functional approximation part
- slides source files/Hieber_Risk.xxx for the minimax estimation rate part
- Optimal approximation of piecewise smooth functions using deep ReLU neural networks; P Petersen, F Voigtlaender - Neural Networks, 2018 - Elsevier
- papers/1709.05289.pdf
- slides source files/Petersen.xxx
- Error bounds for approximations with deep ReLU networks; D Yarotsky - Neural Networks, 2017 - Elsevier
- papers/1610.01145.pdf
- slides source files/Yarotsky.xxx
The following papers are possibly in the pipeline.
- Universality of deep convolutional neural networks; DX Zhou - Applied and computational harmonic analysis, 2020 - Elsevier
- papers/1805.10769.pdf
- Fast learning rates for plug-in classifiers; JY Audibert, AB Tsybakov - The Annals of statistics, 2007
- papers/1183667286.pdf
- Optimal aggregation of classifiers in statistical learning; AB Tsybakov - The Annals of Statistics, 2004
- papers/1079120131.pdf
- Smooth discrimination analysis; E Mammen, AB Tsybakov - The Annals of Statistics
- papers/1017939240.pdf
- A Theoretical Analysis of Deep Q-Learning; Jan et al. (2020); A theoretical analysis of the deep reinforcement learning.
- papers/1901.00137.pdf [https://arxiv.org/pdf/1901.00137.pdf]
- Understanding Implicit Regularization in Over-Parameterized Nonlinear Statistical Model. Jianqing Fan, Zhuoran Yang, Mengxin Yu (2020)
- papers/2007.08322.pdf [https://arxiv.org/abs/2007.08322]
- Gradient Descent Provably Optimizes Over-parameterized Neural Networks, Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh (2018)
- papers/1810.02054.pdf [https://arxiv.org/abs/1810.02054]
- ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm, Chris Junchi Li, Wenlong Mou, Martin J. Wainwright, Michael I. Jordan (2020)
- papers/ [http://www.optimization-online.org/DB_FILE/2020/08/7979.pdf]; also at [https://arxiv.org/pdf/2008.12690.pdf]
- {Euclidean, Metric, Wasserstein} Gradient Flows: an overview, Filippo Santambrogio
- papers/ surveyGradFlows.pdf
- Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds, Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang (2020)
- papers/ 2002.04026.pdf
- How Much Over-parametrization is Sufficient to Learn Deep ReLU Networks, Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu (2020)
- papers/ 1911.12360.pdf
- Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization, Huishuai Zhang, Da Yu, Mingyang Yi, Wei Cheny, Tie-Yan Liu (2019)
- papers/ 1903.07120.pdf
- Implicit Regularization via Hadamard Product Over-Parametrization in High-Dimensional Linear Regression, Peng Zhao, Yun Yang, and Qiao-Chu He (2019)
- papers/ 1903.09367.pdf
- Implicit Regularization for Optimal Sparse Recovery, Tomas Vaškevicius, Varun Kanade, Patrick Rebeschini (2019)
- papers/ 1909.05122.pdf
- Neural Tangent Kernel: Convergence and Generalization in Neural Networks, Arthur Jacot, Franck Gabriel, Clement Hongler (2018)
- papers/ 1806.07572.pdf
- Kernel Alignment Risk Estimator: Risk Prediction from Training Data, Jacot et. al. (2020)
- papers/ 2006.09796.pdf
- On Exact Computation with an Infinitely Wide Neural Net, Arora et. al. (2019)
- papers/ 1904.11955.pdf
- Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks, Arora et. al. (2019)
- papers/ https://arxiv.org/abs/1901.08584
- Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime (2020). Atsushi Nitanda, Taiji Suzuki
- papers/ https://arxiv.org/abs/2006.12297
"We analyze the convergence of the averaged stochastic gradient descent for over-parameterized two-layer neural networks for regression problems. It was recently found that, under the neural tangent..."
- Why ResNet Works? Residuals Generalize, He et. al. (2020)
- papers/ 08984747.pdf
- On the Similarity between the Laplace and Neural Tangent Kernels, Amnon Geifman, Abhay Yadav, Yoni Kasten, Meirav Galun, David Jacobs, Ronen Basri. (2020)
- papers/ 08984747.pdf
- Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS
- papers/ 08984747.pdf
- A Convergence Theory for Deep Learning via Over-Parameterization -- by Allen-zhu, Li, Song (June 2019)
- papers/ 1811.03962.pdf
- Deep learning: a statistical viewpoint -- by Bartlett, Montanari, and Rakhlin (March 2021)
- papers/ 2103.09177.pdf
- Regularization matters: A nonparametric perspective on overparametrized neural network. Wenjia Wang, Tianyang Hu, Cong Lin, and Guang Cheng. (July 2020)
- papers/ 2007.02486.pdf
- a course project description by Andrea Montanari at Stanford. It's a good resources to find relevant references.
- papers/ Andrea_Course_proj2021.pdf