A simple neural network library in Go from scratch. 0 dependencies*
there are 0 neural network related dependencies, the only dependency is for persisting the weights to a file (golang/protobuf)
func main() {
g := gone.New(
0.1,
gone.MSE(),
gone.Layer{
Nodes: 2,
},
gone.Layer{
Nodes: 4,
Activator: gone.Sigmoid(),
},
gone.Layer{
Nodes: 1,
},
)
g.Train(gone.SGD(), gone.DataSet{
{
Inputs: []float64{1, 0},
Targets: []float64{1},
},
{
Inputs: []float64{0, 1},
Targets: []float64{1},
},
{
Inputs: []float64{1, 1},
Targets: []float64{0},
},
{
Inputs: []float64{0, 0},
Targets: []float64{0},
},
}, 5000)
g.Predict([]float64{1, 1})
}
g.Save("test.gone")
g, err := gone.Load("test.gone")
- Types of task:
- Classification -
softmax
(soon to be implemented) as the last layer's activation function - Regression -
sigmoid
as the last layer's activation function
- Classification -
- Bias
- Matrix, rather than a single number
- Feedforward (Predict)
- Train
- Support shuffling the data
- Epochs
- Backpropagation
- Batching
- Different loss functions
- Mean Squared Error
- Cross Entropy Error
- Saving data - Done thanks to protobuf
- Loading data
- Adam optimizer
- Nestrov + Momentum for GD
- Fix MSE computation in debug mode (not used in actual backpropagation)
- Somehow persist configurations for Activation, Loss and Optimizer functions in the protobuf messages (???, if we want to do it like it tensorflow, we'd have to do
interface{}
and do type assertions) - Convolutional Layers
- Flatten layer
- Copy
- Crossover
- Mutate
- Gaussian Mutator
NOTE: all of this was migrated to github.com/fr3fou/matrigo
- Randomize
- Transpose
- Scale
- AddMatrix
- Add
- SubtractMatrix
- Subtract
- Multiply
- Multiply
- Flatten
- Unflatten
- NewFromArray - makes a single row
- Map
- Fold
- Methods to support chaining
n.Weights[i].
Multiply(output). // weighted sum of the previous layer
Add(n.Layers[i+1].Bias). // bias
Map(func(val float64, x, y int) float64 { // activation
return n.Layers[i+1].Activator.F(val)
})
- Derivatives ~
- Partial Derivatives ~
- Linear vs non-linear problems (activation function)
- Gradient Descent
- (Batch) Gradient Descent (GD)
- Stochastic Gradient Descent (SGD)
- Mini-Batch Gradient Descent (MBGD?)
- Softmax (needed for multi class classification!)
- Mean Squared Error
- Cross Entropy Error (needed for multi class classification!)
- How to determine how many layers and nodes to use
- One Hot Encoding
- Convolutional Layers
- Reinforcment learning
- Genetic Algorithms~
- Neuroevolution~
- Simulated Annealing
- Q-Learning
- Linear vs Logistic Regression
- 3D inputs (regarding Video and CNNs)
These are some (stupid) questions I have that confuse me:
- Is Neuroevolution considered Reinforcement learning?
- How is training done with HUGE datasets when they can't fit on your storage device?
- Imagine your dataset is a copule of TB big, what do you do?
- Is Q-Learning only done with a single agent (unlike genetic algorithms / neuroevolution)?
- Is Q-Learning the only method for Reinforcement Learning?
- What's the difference between a Convolutional Neuron and a normal weight matrix?
- Is Deep Learning really just a Neural Network with a lot of layers? (more than 2)
- Why do you need multiple CNN layers? Is it to go to a smaller and smaller version of the image? (when working with images that is) (because of MaxPooling?) Why can't you go directly to the smallest size (512x512 -> 16x16 vs 512x512 -> 256x256 -> 128x128 -> ...)?
- So if images are stored in a 2D array (but with the RGB channels, making it a 3D array with 3 layers), do we use
Conv2D
orConv3D
? - If 3D inputs are used for videos, how is that represented? Is a single input basically an array of 2D arrays (array of images - frames)? So basically a single observation is a single video and your entire dataset is a lot of videos, right?
- XOR Problem
- Digit Classifier
- Flappy Bird AI
- David Josephs - was of HUGE help with algebra and other ML-related questions; also helped me spot some nasty bugs!
Note: some of the references weren't used during the development, but are in this section as they were a helpful guidance throughout my AI journey
- https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning-activation-functions-when-to-use-them/
- https://www.youtube.com/watch?v=XJ7HLz9VYz0&list=PLRqwX-V7Uu6Y7MdSCaIfsxc561QI0U0Tb
- https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
- http://matrixmultiplication.xyz/
- https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:matrices/x9e81a4f98389efdf:properties-of-matrix-addition-and-scalar-multiplication/a/properties-of-matrix-addition
- https://www.wikiwand.com/en/Matrix_(mathematics)
- https://www.wikiwand.com/en/Activation_function
- https://www.wikiwand.com/en/Delta_rule
- https://www.jeremyjordan.me/intro-to-neural-networks/
- https://www.arxiv-vanity.com/papers/2003.02139/
- https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/
- http://neuralnetworksanddeeplearning.com/chap2.html
- https://arxiv.org/pdf/1802.01528.pdf
- https://github.com/stevenmiller888/mind/blob/master/index.js
- https://github.com/stevenmiller888/go-mind
- https://medium.com/yottabytes/everything-you-need-to-know-about-gradient-descent-applied-to-neural-networks-d70f85e0cc14
- https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-should-i-use-ac02f1c56aa8
- https://github.com/milosgajdos/go-neural
- https://ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html
- https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
- https://medium.com/coinmonks/representing-neural-network-with-vectors-and-matrices-c6b0e64db9fb
- https://towardsdatascience.com/classifying-cat-pics-with-a-logistic-regression-model-e35dfb9159bb
- https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65
- https://towardsdatascience.com/gradient-descent-from-scratch-e8b75fa986cc
- https://rstudio-pubs-static.s3.amazonaws.com/337306_79a7966fad184532ab3ad66b322fe96e.html
- https://gombru.github.io/2018/05/23/cross_entropy_loss
- https://medium.com/@tomvykruta/memory-aid-for-softmax-and-cross-entropy-loss-5704c66d795d
- https://gombru.github.io/2018/05/23/cross_entropy_loss/
- https://cs.stackexchange.com/questions/90228/crossover-operator-in-genetic-algorithms-in-neural-networks
- https://stackoverflow.com/questions/54625643/where-is-the-gaussian-distribution-function-in-the-pseudocode-below
- https://www.wikiwand.com/en/Normal_distribution
- https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivative-and-gradient-articles/a/introduction-to-partial-derivatives
- https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivative-and-gradient-articles/a/the-gradient
- https://towardsdatascience.com/e2e-the-every-purpose-ml-method-5d4f20dafee4