Skip to content

πŸ‘³πŸΎβ€β™‚οΈ Comparing accuracies of different convolutional neural networks for the task of classifying Indians into North and South Indians

License

Notifications You must be signed in to change notification settings

tuhinspatra/classification-north-south-indian

Β 
Β 

Repository files navigation

Classification of Indians into North and South Indians

Comparing the accuracies of different convolutional neural networks for the task of classifying Indians into North and South Indians using facial image data

This project was made for the CS-1651 Mini Project course at MNNIT Allahabad

The documents made for the project can be found in the docs branch

Dataset description

The first course of action was to collect images of North Indian and South Indian faces. Used Google Custom Search API to get images using common regional surnames. This resulted in images that were popular on the internet and did not provide us with images of the average civilian.

This posed a problem. Fortunately, came across CNSIFD (Centre for Neuroscience Indian Face Dataset). This contained 500 x 500 gray scale images with normalised faces in elliptical cropped window.

The dataset provided two .mat files that were of interest to us.

  • cnsifd_imgs.mat
  • cnsifd_info.mat

cnsifd_imgs.mat contained all the face image data and cnsifd_info.mat contained all the labels for the face images and any additional information

The image files extracted from cnsifd_imgs.mat in .csv format. There are 1647 .csv files in total. Each of the .csv files are image files of around 500 x 350 dimensions. Used the imshow function in Octave to check whether they really are images. Each .csv matrix will produce an image that is pre-processed (black and white, elliptical).

For example, the first 10 image files have the following dimensions.

octave:32> for i = 1:10
\> size(cnsifd_imgs{1,i})
\> end
              ans =
                 501   380
              ans =
                 501   356
              ans =
                 501   350
              ans =
                 501   353
              ans =
                 501   382
              ans =
                 501   368
              ans =
                 501   371
              ans =
                 501   389
              ans =
                 501   368
              ans =
                 501   350

Each of the face data matrices have been converted to .csv and stored in the cnsifd/cnsifd-imgs folder.

The cnsifd_info.mat file had the following fields.

fields =
   {
      [1,1] = source_dataset: nfaces x 1, 1-SET1, 2-SET2
      [2,1] = region: nfaces x 1, 1-north, 0-south
      [3,1] = gender: nfaces x 1, 1, 1-male, 0-female
      [4,1] = age: nfaces x 1 declared age in years or nan if not declared
      [5,1] = weight: nfaces x 1 declared weight in kg or nan if not declared
      [6,1] = height: nfaces x 1 declared height in cms or nan if not declared
      [7,1] = pc: nfaces x 1 percentage correct in a north/south categorisation task, nan if not declared
      [8,1] = landmarks: 76 x 2, x,y coordinates of aam landmarks
      [9,1] = landmarksm: 80 x 2, x,y coordinates of aam landmarks
      [10,1] = intensity_landmarks: 31 x 3 landmark ids for face patches
      [11,1] = spatial_landmarks: 32 x 3 landmark ids for face distance measurements
      [12,1] = spatial: nfaces x 23 , spatial measurements
      [13,1] = intensity: nfaces x 31 
      [14,1] = [](0x0)
      [15,1] = [](0x0)
      [16,1] = [](0x0)
      [17,1] = [](0x0)
      [18,1] = [](0x0)
      [19,1] = allfeatures : nfaces x 1446 matrix of all features together in order spatial, intensity, spatial_ratio and intensity_ratio
      [20,1] = allfeatures_raw : nfaces x 1446 normalised matrix of all features together in order spatial, intensity, spatial_ratio and intensity_ratio
      [21,1] = bf.bflandmarks: nfaces x ?  x 2 selected aam landmarks for each face
      [22,1] = bf.bftriangles: nfaces x ? x 3 x 2 x,y coordinates of triangle vertices
      [23,1] = bf.bfspatial: nfaces x ?  measurements between landmarks
      [24,1] = bf.bfintensity: nfaces x ?  measurements on triangles on faces
      [25,1] = cnn: three fields having CNN-F, CNN-A, CNN-G features
      [26,1] = moments: nfaces x 7 intensity moments
      [27,1] = lbp: nfaces x 1328 local binary patterns
      [28,1] = hog: nfaces x 6723 histograms of gradients at multiple scales
      [29,1] = siex: nfaces x 1647 exhaustive measurements from triangulating the face
      [30,1] = spatial_ratio: nfaces x 231 , spatial ratio measurements
      [31,1] = spatial_product: nfaces x 231 , spatial product measurements
      [32,1] = intensity_ratio: nfaces x 465 , intensity ratio measurements
      [33,1] = intensity_product: nfaces x 465 , intensity product measurements
   }

The first 7 nfaces x 1 data was used to make the info.csv file. The headings in the .csv were manually added in for additional reference.

Summary of the face dataset

Age and gender distribution of the samples

Gender distribution table

Once the data from the CNSIFD dataset was converted to .csv, the rows that did not include a label (i.e. images with the label as NaN) were removed from consideration. The csv files were updated to remove those rows and the indices were updated accordingly. Using the updated .csv, the required image csvs were converted to .png images using matplotlib.pyplot.imsave.

Images obtained from the dataset

The images were then separated into training, test and cross validation sets using the split_folders library.

Convolutional Neural Networks used

AlexNet

AlexNet is a convolutional neural network that is 8 layers deep and can classify images into 1000 object categories. It is composed of 5 convolutional layers followed by 3 fully connected layers. AlexNet, proposed by Alex Krizhevsky, uses ReLu, instead of a tanh or sigmoid function which was the earlier standard for traditional neural networks. The advantage of ReLu over the sigmoid function is that it trains much faster than the latter. This is due to the fact that the derivative of sigmoid becomes very small in the saturating region and therefore the updates to the weights almost vanish. Another problem that this architecture solved was that it reduced overfitting by using a Dropout layer after every FC layer.

AlexNet neural network structure

VGG

VGG16 is a convolutional neural network model proposed by K. Simonyan and A.Zisserman from the University of Oxford in the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition". VGG19 has a similar model architecture as VGG16 with three additional convolutional layers, it consists of a total of 16 Convolution layers and 3 dense layers.

VGG16 architecture

VGG16 neural network structure

VGG19 neural network structure

ResNet

Every other previous model before the ResNet used deep neural networks in which many convolutional layers were stacked one after the other. It was believed that deeper networks perform better. However, it turned out that this was not really true.

Deep networks face the following problems

  • network becomes difficult to optimize
  • vanishing/exploding gradients
  • degradation problem (accuracy first saturates and then degrades)

To address these problem, authors of the ResNet architecture came up with the idea of skip connections with the hypothesis that the deeper layers should be able to learn something as equal as shallower layers. A possible solution was to copy the activations from shallower layers and setting additional layers to identity mapping. These connections were enabled by skip connections.

ResNet neural network structure

Results

Confusion Matrices (left to right): AlexNet; VGG16; VGG19; ResNet50; ResNet152

Confusion bar chart

Accuracy bar chart

Training set vs validation set bar chart

Built With

  • Google Colab - Research tool for machine learning education and research
  • JupyterLab - Next-generation web-based user interface for Project Jupyter
  • Octave - Open-source software featuring a high-level programming language, primarily intended for numerical computations
  • MATLAB - Multi-paradigm numerical computing environment and proprietary programming language developed by MathWorks
  • fast.ai - Easy to use deep learning library
  • PyTorch - Open-source machine learning library for Python, based on Torch
  • Keras - Open-source neural-network library written in Python, capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML
  • pandas - Software library written for the Python programming language for data manipulation and analysis
  • Matplotlib - Plotting library for the Python programming language
  • NumPy - Library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
  • SciPy - Free and open-source Python library used for scientific computing and technical computing
  • seaborn - A Python data visualization library based on matplotlib
  • Split Folders - Automatically split folders with files (i.e. images) into training, validation and test (dataset) folders
  • Google Sheets - Spreadsheet program included as part of a free, web-based software office suite offered by Google within its Google Drive service
  • Google Drive - File storage and synchronization service developed by Google

Authors

(University Project group under the mentorship of Prof. Suneeta Agarwal)

See also the list of contributors who participated in this project.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details

Acknowledgments

About

πŸ‘³πŸΎβ€β™‚οΈ Comparing accuracies of different convolutional neural networks for the task of classifying Indians into North and South Indians

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%