Comparing the accuracies of different convolutional neural networks for the task of classifying Indians into North and South Indians using facial image data
This project was made for the CS-1651 Mini Project course at MNNIT Allahabad
The documents made for the project can be found in the docs
branch
The first course of action was to collect images of North Indian and South Indian faces. Used Google Custom Search API to get images using common regional surnames. This resulted in images that were popular on the internet and did not provide us with images of the average civilian.
This posed a problem. Fortunately, came across CNSIFD (Centre for Neuroscience Indian Face Dataset). This contained 500 x 500 gray scale images with normalised faces in elliptical cropped window.
The dataset provided two .mat
files that were of interest to us.
cnsifd_imgs.mat
cnsifd_info.mat
cnsifd_imgs.mat
contained all the face image data and cnsifd_info.mat
contained all the labels for the face images and any additional information
The image files extracted from cnsifd_imgs.mat
in .csv
format. There are 1647 .csv
files in total. Each of the .csv
files are image files of around 500 x 350 dimensions. Used the imshow
function in Octave to check whether they really are images. Each .csv
matrix will produce an image that is pre-processed (black and white, elliptical).
For example, the first 10 image files have the following dimensions.
octave:32> for i = 1:10
\> size(cnsifd_imgs{1,i})
\> end
ans =
501 380
ans =
501 356
ans =
501 350
ans =
501 353
ans =
501 382
ans =
501 368
ans =
501 371
ans =
501 389
ans =
501 368
ans =
501 350
Each of the face data matrices have been converted to .csv
and stored in the cnsifd/cnsifd-imgs
folder.
The cnsifd_info.mat
file had the following fields.
fields =
{
[1,1] = source_dataset: nfaces x 1, 1-SET1, 2-SET2
[2,1] = region: nfaces x 1, 1-north, 0-south
[3,1] = gender: nfaces x 1, 1, 1-male, 0-female
[4,1] = age: nfaces x 1 declared age in years or nan if not declared
[5,1] = weight: nfaces x 1 declared weight in kg or nan if not declared
[6,1] = height: nfaces x 1 declared height in cms or nan if not declared
[7,1] = pc: nfaces x 1 percentage correct in a north/south categorisation task, nan if not declared
[8,1] = landmarks: 76 x 2, x,y coordinates of aam landmarks
[9,1] = landmarksm: 80 x 2, x,y coordinates of aam landmarks
[10,1] = intensity_landmarks: 31 x 3 landmark ids for face patches
[11,1] = spatial_landmarks: 32 x 3 landmark ids for face distance measurements
[12,1] = spatial: nfaces x 23 , spatial measurements
[13,1] = intensity: nfaces x 31
[14,1] = [](0x0)
[15,1] = [](0x0)
[16,1] = [](0x0)
[17,1] = [](0x0)
[18,1] = [](0x0)
[19,1] = allfeatures : nfaces x 1446 matrix of all features together in order spatial, intensity, spatial_ratio and intensity_ratio
[20,1] = allfeatures_raw : nfaces x 1446 normalised matrix of all features together in order spatial, intensity, spatial_ratio and intensity_ratio
[21,1] = bf.bflandmarks: nfaces x ? x 2 selected aam landmarks for each face
[22,1] = bf.bftriangles: nfaces x ? x 3 x 2 x,y coordinates of triangle vertices
[23,1] = bf.bfspatial: nfaces x ? measurements between landmarks
[24,1] = bf.bfintensity: nfaces x ? measurements on triangles on faces
[25,1] = cnn: three fields having CNN-F, CNN-A, CNN-G features
[26,1] = moments: nfaces x 7 intensity moments
[27,1] = lbp: nfaces x 1328 local binary patterns
[28,1] = hog: nfaces x 6723 histograms of gradients at multiple scales
[29,1] = siex: nfaces x 1647 exhaustive measurements from triangulating the face
[30,1] = spatial_ratio: nfaces x 231 , spatial ratio measurements
[31,1] = spatial_product: nfaces x 231 , spatial product measurements
[32,1] = intensity_ratio: nfaces x 465 , intensity ratio measurements
[33,1] = intensity_product: nfaces x 465 , intensity product measurements
}
The first 7 nfaces x 1 data was used to make the info.csv
file. The headings in the .csv
were manually added in for additional reference.
Summary of the face dataset
Age and gender distribution of the samples
Gender distribution table
Once the data from the CNSIFD dataset was converted to .csv
, the rows that did not include a label (i.e. images with the label as NaN
) were removed from consideration. The csv
files were updated to remove those rows and the indices were updated accordingly.
Using the updated .csv
, the required image csv
s were converted to .png
images using matplotlib.pyplot.imsave
.
The images were then separated into training, test and cross validation sets using the split_folders
library.
AlexNet is a convolutional neural network that is 8 layers deep and can classify images into 1000 object categories. It is composed of 5 convolutional layers followed by 3 fully connected layers. AlexNet, proposed by Alex Krizhevsky, uses ReLu, instead of a tanh or sigmoid function which was the earlier standard for traditional neural networks. The advantage of ReLu over the sigmoid function is that it trains much faster than the latter. This is due to the fact that the derivative of sigmoid becomes very small in the saturating region and therefore the updates to the weights almost vanish. Another problem that this architecture solved was that it reduced overfitting by using a Dropout layer after every FC layer.
AlexNet neural network structure
VGG16 is a convolutional neural network model proposed by K. Simonyan and A.Zisserman from the University of Oxford in the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition". VGG19 has a similar model architecture as VGG16 with three additional convolutional layers, it consists of a total of 16 Convolution layers and 3 dense layers.
VGG16 architecture
VGG16 neural network structure
VGG19 neural network structure
Every other previous model before the ResNet used deep neural networks in which many convolutional layers were stacked one after the other. It was believed that deeper networks perform better. However, it turned out that this was not really true.
Deep networks face the following problems
- network becomes difficult to optimize
- vanishing/exploding gradients
- degradation problem (accuracy first saturates and then degrades)
To address these problem, authors of the ResNet architecture came up with the idea of skip connections with the hypothesis that the deeper layers should be able to learn something as equal as shallower layers. A possible solution was to copy the activations from shallower layers and setting additional layers to identity mapping. These connections were enabled by skip connections.
ResNet neural network structure
Confusion Matrices (left to right): AlexNet; VGG16; VGG19; ResNet50; ResNet152
Confusion bar chart
Accuracy bar chart
Training set vs validation set bar chart
- Google Colab - Research tool for machine learning education and research
- JupyterLab - Next-generation web-based user interface for Project Jupyter
- Octave - Open-source software featuring a high-level programming language, primarily intended for numerical computations
- MATLAB - Multi-paradigm numerical computing environment and proprietary programming language developed by MathWorks
- fast.ai - Easy to use deep learning library
- PyTorch - Open-source machine learning library for Python, based on Torch
- Keras - Open-source neural-network library written in Python, capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML
- pandas - Software library written for the Python programming language for data manipulation and analysis
- Matplotlib - Plotting library for the Python programming language
- NumPy - Library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
- SciPy - Free and open-source Python library used for scientific computing and technical computing
- seaborn - A Python data visualization library based on matplotlib
- Split Folders - Automatically split folders with files (i.e. images) into training, validation and test (dataset) folders
- Google Sheets - Spreadsheet program included as part of a free, web-based software office suite offered by Google within its Google Drive service
- Google Drive - File storage and synchronization service developed by Google
(University Project group under the mentorship of Prof. Suneeta Agarwal)
- Tuhin Subhra Patra - armag-pro (Team Leader)
- Rajat Dipta Biswas - rajatdiptabiswas
- Upmanyu Jamwal
- S Pranav Ganesh
See also the list of contributors who participated in this project.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details
- Are you from North or South India? A hard race classification task reveals systematic representational differences between humans and machines | Harish Kattia, S.P. Aruna | Centre for Neuroscience, Indian Institute of Science, Bangalore, India
- fast.ai course | Image Classification
- Coursera | Convolutional Neural Networks by Andrew Ng
- Documentations - JupyterLab, Octave, MATLAB, Matplotlib, PyTorch, Keras, NumPy, SciPy, pandas, seaborn
- StackOverflow