Skip to content

Classification of German Traffic Signs using Convolutional Neural Networks with Tensorflow

License

Notifications You must be signed in to change notification settings

turangojayev/CarND-Traffic-Sign-Classifier-Project

Repository files navigation

CarND-Traffic-Sign-Classifier-Project

The goal in this project is to train a convolutional neural network with Tensorflow for building a classifier for the German Traffic Sign dataset. Dataset consists of RGB images of size 32x32 for 43 different traffic signs. The aim is to get at least 93% accuracy on validation set.


Exploration

German Traffic Sign dataset consists of three separate parts, namely training (34799 examples), validation (4410 examples) and test sets (12630 examples). Images are all in RGB and resized to the size of 32x32. In total, there are 43 distinct traffic signs. Here are 10 examples from each type:

image1

image2

image3

image4

image5

image6

image7

image8

image9

image10

image11

image12

image13

image14

image15

image16

image17

image18

image19

image20

image21

image22

image23

image24

image25

image26

image27

image28

image29

image30

image31

image32

image33

image34

image35

image36

image37

image38

image39

image40

image41

image42

image43

Some of the images are dark, whereas the others are very light, not mentioning the ones that are hazy and difficult also for human eye to label them correctly.

The dataset is imbalanced and the class distribution in different sets are comparable:

table

Some of the darkest images in training data:

image44

Preprocessing

Since the traffic signs are distinct by their appearances, we can omit the color information and convert the images to the grayscale. Conversion to grayscale makes dark images more distinguishable and also improved the classification results for me. Here is how the dark image set above looks like in grayscale:

image45

To improve the contrast of the images I utilize two methods:

For local contrast normalization, first, (normalized) gaussian weighted sum of surrounding pixel intensities are subtracted from pixels.

formula1

At second step, these values are divided by square root of weighted sum of squares of all features over a spatial neighborhood.

        formula2

where

             formula3

and

            formula4

As a result, dark and light regions change a little, but the edges of the images become clearer.

image46

Another method to make contrast in images more visible is to use contrast limited adaptive histogram equalization, which spreads the intensities to a wider range, thus improving the contrast.

image47

Augmentations

I artificially increase the size of dataset by adding distorted images to the original data, in order to make the classifier more robust to potential deformations. Augmentations include rotations by angle uniformly selected from interval of [-20, 20] degrees and uniform distortions by magnitude of [-0.1, 0.1] of the image applied to each corner. For each image in the training dataset I create an additional randomly rotated image. Here are some examples:

image48

Then for each of the images in this new dataset I add one image which is distorted at each of the four edges randomly.

image49

As a result training data grows to 4x of the original size.

Architecture

I used a neural network architecture similar to Cireşan, 2012.

Layer
convolutional, 100 features, 7x7 kernel
dropout, probability 0.3
maxpool, 2x2 kernel, 2x2 strides
convolutional, 150 features, 4x4 kernel
dropout, probability 0.3
maxpool, 2x2 kernel, 2x2 strides
convolutional, 250 features, 2x2 kernel
dropout, probability 0.3
maxpool, 2x2 strides
dense, size 300
dropout, probability 0.35
softmax, 43

For training the model I use Adam (adaptive momentum) optimizer with the batch size of 64 and train the model maximum for 50 epochs. Early stopping acts as a regularizer and lets us stop training before it starts overfitting to the training data. I use the value of the loss function on validation data to decide when to stop. Dropout probabilities that I run the training with are 0.3 after convolutional layers and 0.35 for the dense layer. While training, I selected the combination of simplicity of the model and the speed for training as priority. However, one can elaborate on this task as described here.

Solution

Since I use fairly simple architecture, I let the model run 5 times. I repeat the same procedure for both preprocessing methods (contrast normalization and histogram equalization), which results in 10 models. I average the output probabilities of each model to make the final prediction. Accuracy reached on train and validation sets reaches 99% for each of the models. The accuracy on test set ranges between 97.5-98.5% for the models and the average of the probabilities gives ~99% accuracy on test data. More detailed results for each class is below:

         precision    recall  f1-score   support

      0       1.00      1.00      1.00        60
      1       0.99      1.00      1.00       720
      2       1.00      1.00      1.00       750
      3       0.99      0.97      0.98       450
      4       1.00      1.00      1.00       660
      5       0.98      0.99      0.99       630
      6       1.00      0.99      0.99       150
      7       0.99      1.00      1.00       450
      8       1.00      1.00      1.00       450
      9       0.99      1.00      0.99       480
     10       1.00      1.00      1.00       660
     11       0.93      0.98      0.95       420
     12       0.99      1.00      1.00       690
     13       1.00      1.00      1.00       720
     14       0.99      1.00      1.00       270
     15       1.00      1.00      1.00       210
     16       1.00      1.00      1.00       150
     17       1.00      0.99      0.99       360
     18       0.99      0.97      0.98       390
     19       1.00      1.00      1.00        60
     20       0.97      1.00      0.98        90
     21       0.90      1.00      0.95        90
     22       1.00      0.84      0.91       120
     23       0.94      1.00      0.97       150
     24       1.00      0.99      0.99        90
     25       0.95      0.99      0.97       480
     26       0.99      1.00      0.99       180
     27       0.97      0.50      0.66        60
     28       0.99      1.00      1.00       150
     29       0.96      1.00      0.98        90
     30       0.93      0.87      0.90       150
     31       1.00      1.00      1.00       270
     32       1.00      1.00      1.00        60
     33       1.00      1.00      1.00       210
     34       1.00      1.00      1.00       120
     35       1.00      0.99      0.99       390
     36       0.99      0.98      0.99       120
     37       1.00      1.00      1.00        60
     38       1.00      1.00      1.00       690
     39       1.00      1.00      1.00        90
     40       0.99      0.97      0.98        90
     41       1.00      0.90      0.95        60
     42       1.00      1.00      1.00        90

As we can see above, the data for approximately half of the classes are classified perfectly (recall and precision are 1.00). Overall classification accuracy is close to 99% percent, and recall for only one of the classes ("Pedestrians") is around the half of the images for that class.

Test a Model on New Images

To check the classification results on new images, I found 10 images (three of them I shot myself) that seemed interesting to me. Only three of the images (dirty and yellowish "No entry" signs and "Priority road") have the type in German Traffic Sign Dataset. Rest of the images have something in common in appearance with the signs in our dataset and the expectation, that the classes, which are predicted with high probability for those images, correspond to similarly looking signs, holds for almost in all cases.

The first sign that I shot, was a dirty "End of speed zone 30 km/h" sign. Since the sign is not among classes in dataset, we can expect the probabilities to be low. Moreover, the road sign is dirty (what was the reason why I took it) and this should also influence the predicted results.

image50

It is difficult for a human eye (at least for me) to classify the resized image. Models will probably perform bad too. Predicted probabilities confirm the hypotheses above:

image51

image52

As we see, the highest probability is 0.223, meaning the average of the models is quite unsure of the sign. We can view the whole distribution in the bar chart:

image53

Let's check the second image:

image54

This time the sign is dirty again, however, the "No entry" sign exist in our dataset and resized image resembles the ones in our dataset. We can expect one class being predicted with high probability.

image55

image56

image57

Third image is also from our dataset ("No entry"), however, the color is completely different from the ones in dataset. There's only one sign with yellow color in the original data ("Priority road") and the models predict similar probability for these two classes ("No entry" and "Priority road"). Generally, the highest probability will be low.

image58

image59

image60

image61

Fourth image ("End of zone 30 km/h") does not exist in the dataset either, yet the resized image is clearer this time in comparison to the first one. The sign by itself looks similar to speed limit signs (30, 80, 20) and "End of zone 80 km/h" sign. Some of those should appear among the top 5.

image62

image63

image64

image65

Next image is of sign "Speed limit 130 km/h", which again, does not exist in original dataset. Visually similar ones are speed limit signs for 100, 120, 30, 20 80 km/h and they will most probably appear in the top 5. Once more, probabilities should be low.

image66

image67

image68

image69

Let's plot the next image:

image70

It is difficult to judge about this one, however, trunk of the body resembles "General caution" sign.

image71

image72

image73

Priority road sign exist in our dataset and one can expect this image to be classified correctly.

image74

image75

image76

image77

Now, some funny images that look like "No entry" sign. Some will be classified correctly and with more than 0.5 probability, but it won't get close to 0.9.

image78

image79

image80

image81

image82

image83

image84

image85

image86

image87

image88

image89

As we can see, in general, results are good for test data, as well as for the new test images.

#Visualize the Neural Network's State with Test Images Usually the neural networks are considered as a black box. However, one can also plot the activations of different layers to see the intermediate results and have an idea what is going on inside neural network. For example, for the test image below

image54

here are how the outputs of the convolutional layers look like.

1st convolutional layer

image90

2nd convolutional layer

image91

3rd convolutional layer

image92

From the first and second plots, it is clear, that network can detect the horizontal line in the image of the sign.

About

Classification of German Traffic Signs using Convolutional Neural Networks with Tensorflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published