Impact of Orthogonal Initialization in Deep Learning

Dynamical Isometry as a Consequence of Weight Orthogonality

Ester Hlav, 2019

How does orthogonal initialization of weight matrices help improve the training of neural networks? What happens if we further impose orthogonality during training? We research the effect of dynamical isometry and its positive impact on convergence during training.

What is Dynamical Isometry?

Dynamical Isometry happens when the singular values of the input-output Jacobian for weight matrices equal one. When the Jacobian J is well-conditioned, i.e. its eigenvalues are equal to one, then J is a norm-preserving-mapping, the mean of the Spectral density of J becomes one and dynamical isometry is reached. When a neural network achieves dynamical isometry, the gradient avoids the chaotic (exploding gradient) as well as ordered (vanishing gradient) zone, which triggers better and faster convergence.

Research Questions

Effect of Orthogonal Initialization on:
A) Vanishing and Exploding Gradient
B) Difference of Speed of Convergence between Deep and Shallow Neural Networks
C) Accuracy of Non-Linear Neural Networks with vs without Dynamic Isometry

Effect of Orthogonal Regularization:
D) Can excessive orthogonality constraint (i.e. hard regularization) hurt performance?
E) Can specific conditions (e.g. depth) enforce orthogonality in a more beneficial way than others?

Empirical Results

While the first part of the project researched the mathematical consequences of dynamical isometry, in the second half we conduct experiments for recurrent neural networks (RNNs) and impose an orthogonal regularization constraint with gain on training. While we report for some datasets non-conclusive results of orthogonal regularization, our results on Sequential MNIST dataset report that a gain-adjusted regularizer outperforms a single-soft regularizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Impact of Orthogonal Initialization in Deep Learning

Dynamical Isometry as a Consequence of Weight Orthogonality

Ester Hlav, 2019

What is Dynamical Isometry?

Research Questions

Empirical Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Impact of Orthogonal Initialization in Deep Learning

Dynamical Isometry as a Consequence of Weight Orthogonality

Ester Hlav, 2019

What is Dynamical Isometry?

Research Questions

Empirical Results