The fundamental unit of each Neural Network model is the simple Perceptron (or single neuron). The Perceptron it the simpler mathematical model of biological neuron and it is based on the Rosenblatt [Rosenblatt58theperceptron] model which identifies a neuron as a computational unit with input, synaptic weights and an activation threshold (or function). Following the biological model of Hodgkin and Huxley [HHmodel] (H-H model), we have an action potential, i.e the output of the neuron, given by
where
The connection weights
where t
is the desired output.
In other words we have to firstly compute the difference between the current output and the desired one, i.e the error or cost function or loss function 1, and weight this error by the gain factor and the corresponding input.
Repeating the error computation and the updating rule we can bring the weights to convergence.
From a geometrical point-of-view this process is equivalent to an hyper-plane placement defined by n-
dimensional space into two half-spaces, i.e two desired classes.
The mathematical formulation already highlights the numerous limits of this model. The output function is a simple linear combination of the input with a vector of weights and so only linearly separable problems can be learned 2 by the Perceptron 3. Moreover we can manage only two classes since an hyper-plane divide the space in only two half-spaces.
A key role is assumed by the activation function. The classical activation function used in the discrete Perceptron model is the unit step function (or Heaviside step function). If we chose a continuous and so differentiable activation function we can treat the problem using a continuous cost function. In this case we can define it as
where in this case both
where
which looks identical to previous updating rule but in this case we are managing real numbers and not simple class labels. Moreover in this way we compute the weight updates according to the full set of training sample and not for each sample (this approach leads to the so-called batch-update, i.e small subsets of data).
To implement this kind of model into a pure Python
application we do not need extra libraries but we can just use the native keyword of the language.
A possible implementation of this model was developed and release in a on-line gist.
In this simple snippet we examine the functionality of the Simple Perceptron model across different logical functions and we proof its fast convergence on linear separable datasets 4.
An equivalent C++
implementation of the model is also provided and can be found in this other gist.
The model is too naive for computational efficiency discussions.
Thus we can just observe how a learning algorithm could be easily implemented using basic programming language keywords either in Python
either in C++
.
Footnotes
-
There are multiple loss functions in the Neural Network world. We will further discuss their use and their effective on a learning model in the next section. ↩
-
A classical example of learning problems is given by the XOR logic function. Since the XOR output is not linearly separable the Perceptron could not converge. ↩
-
The proof the non-linear separable convergence introducing an extra stop criteria during the weights tuning given by a maximum number of step. ↩