~A standard (non-convolution based) neural network to classify the MNIST dataset.
- Step 1 : Setting up the database
- Step 2 : Creating the neural network
- Step 3 : Training the model on the dataset
- Step 4 : Testing the Model
- Step 5 : Saving the model
- Step 6 : Logging of Parameters during Model Training and Testing
- To View results for any random picture in the dataset, the following code can be used :
- Further Improvements
The MNIST Database contains gray-scale images of 28x28 dimension where each image represents a handwritten digit which the network has to identify.
We need to download the MNIST Dataset and Transform it to Tensors which we are going to input into the model. This is achieved by :
mnist-dataset-classification/MNIST Classification Model..py
Lines 8 to 9 in 0fa674e
'train' dataset represents our Training dataset and 'test' dataset represents the Testing dataset.
To know about number of samples given in dataset, we can simply use :
mnist-dataset-classification/MNIST Classification Model..py
Lines 10 to 11 in 0fa674e
To see example of images in training dataset, We can use :
image,label = test[0] #to display the first image in test dataset along with its corresponding number
plt.imshow(image.numpy().squeeze(), cmap='gray_r');
print("\nThe Number is : " ,label,"\n")
The accuracy of the estimate and the possibility that the weights of the network will be changed in a way that enhances the model's performance go up with the number of training examples used.
A noisy estimate is produced as a result of smaller batch size, which leads to noisy updates to the model, such as several updates with potentially very different estimates of the error gradient. However, these noisy updates sometimes lead to a more robust model and definately contribute to a faster learning.
Various Types of Gradient Descents :
- Batch Gradient Descent : The whole dataset is treated as one batch
- Stochastic Gradient Descent : Batch size is set to one example.
- Minibatch Gradient Descent : Batch size is set to somewhere in between one and total number of examples in the training dataset.
Given that we have quite a large database, we will not take batch size to be equivalent to the whole dataset.
Smaller batch sizes also give us certain benifits such as :
- Lower generalization error.
- Easiness in fitting one batch of training data in memory.
We will use mini-batch gradient descent so that we update our parameters frequently as well as we can use vectorized implementation for faster computations.
A batch size of maybe 30 examples would be suitable.
We would use dataloader for randomly breaking our datasets into small batches :
Deciding on Number of Hidden Layers and neurons :
This is a topic of very elaborate discussion but to make it easier, The discussions on : AI FAQs were followed in making this model. Thus, The number of hidden layers were decided to be one and the number of hidden nodes in the layer would be 490 (Considering the thumb rule as : The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.)
The input nodes are 784 as a result of 28 x 28 (Number of square pixels in each image), While the Output layer is 10, one for each digit (0 to 9)
This is implemented as :
mnist-dataset-classification/MNIST Classification Model..py
Lines 16 to 18 in 0fa674e
Although a wide range of activation algorithms and formulations can be used and it can be discovered in depth. But for simplicity, LeakyReLU has been used for Hidden Layer PyTorch LeakyReLU. The input layer and output have Linear activation PyTorch Linear. Logsoftmax has been used to formulate the output PyTorch LogSoftmax
The implementation is in :
mnist-dataset-classification/MNIST Classification Model..py
Lines 20 to 23 in 0fa674e
Similar to above, many loss functions can be used to compute the loss but again for simplicity, NLLLoss i.e. Negatice Log Likelihood Loss has been used PyTorch NLLLoss
We have used SGD as Optimization Algorithm here with learning rate (lr) = 0.003 and momentum = 0.9 as suggested in general sense. [Typical lr values range from 0.0001 up to 1 and it is upon us to find a suitable value by cross validation
To calculate the total training time, time module has been used. (Lines 34 and 48)
Trial and Error method can be used to find the suitable epoch value, for this code, it has been setup to be 18
Overall Training is being done as :
mnist-dataset-classification/MNIST Classification Model..py
Lines 33 to 49 in a014ffa
mnist-dataset-classification/MNIST Classification Model..py
Lines 51 to 66 in a014ffa
To log and vizualize the model parameters, Tensorboard has being used. For now, It logs Loss vs Epoch data for which graph can be accessed using :
tensorboard --logdir=runs
The Logging happens at :
Following type of a graph is achieved as a result. It may vary if you change the algorithms and other parameters of the model :
It also creates a graph displaying the probabilities returned by the model.
import numpy as np
def view_classify(img, ps):
ps = ps.cpu().data.numpy().squeeze()
fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
ax1.axis('off')
ax2.barh(np.arange(10), ps)
ax2.set_aspect(0.1)
ax2.set_yticks(np.arange(10))
ax2.set_yticklabels(np.arange(10))
ax2.set_title('Class Probability')
ax2.set_xlim(0, 1.1)
plt.tight_layout()
img,label=train[np.random.randint(0,10001)]
image=img.view(1, 784)
with tch.no_grad():
logps = model(image)
ps = tch.exp(logps)
probab = list(ps.numpy()[0])
print("Predicted Digit =", probab.index(max(probab)))
view_classify(image.view(1, 28, 28), ps)
Model Accuracy : The Accuracy of the model with this code is approximately 97.8% to 98.02% with a training time of aprox. 3.5 to 4 minutes
- Working on expanding Logging and Graphing to Other Parameters to give a more comprehensive assessment of the model's performance.
- Looking to test with different algorithms to strike a balance between training time and accuracy.