Convolution Neural Network

facial

Julio Gonzalez

June 8, 2019

Estimated Reading time: ~14 mins


1. Introduction

2. Neural Networks

3. Applied

4. Conclusion

1. Introduction

Hello! I'm the Stats Whisper and I'm back with another Data Science topic that is one of the hottest things in research now: deep learning. Even if you lend a very small ear to what is going on in the scientific community, you have probably heard of deep learning. It is the backbone of some of the coolest things today like self-driving cars, facial-recognition, virtual digital assistants and accurately estimating how a human on the other side of a wall is standing/sitting/walking just from perturbations in Wi-Fi signals caused by that human (don't know what good that is for but hey).

Believe it or not, neural networks have been around for a while but the slashed costs of computing power paired with the increased computing capacity are the impetus behind the exponential growth of the field.

With the expansion of the field grew all types of neural networks. Today, we will be focusing on a particular type of neural network called a convolution neural network (CNN for short). Research has found these models to be excellent when it comes to image recognition. And image recognition we shall.

2. Neural Networks

Okay, so before we dive into what exactly a convolution neural network is, let us strip it down to its essence: just a plain and simple vanilla neural network. Like the depths of the universe, you can go as far, wide and deep into this topic and you'll probably never get to the bottom of it because of the presence of a very active area of research that is pushing the field even further.

So, what exactly is a neural network? Well, it's actually pretty simple: a function. A function that can be very complex but a function nevertheless.

At its most basic core, a neural network has 3 parts:

  • input layer
  • hidden layer
  • output layer

basic nn

As the name suggests, the input layer is where the data is feed into the network. The mysterious hidden layer is where all the action is happening and output layer is the result you wish to acquire.

To make this more relatable, take the ubiquitous line function we have all been exposed to in Algebra class: \begin{align} y & = mx + b. \end{align}The x in this case is the input layer. The m and b can be taken as the hidden layer while the y is the output layer. If you know the x coordinate and the slope of the line, you can calculate what the y coordinate would be for that line.

Easy right?

A convolution neural network does the exact same thing by using an image as input, crunches some numbers then does a prediction as an output.

cnn

There is a great amount of content out there that goes into good detail and further explains how the mechanics of a convolution neural network work. It can get very mathematical and technical quick so you can easily lose an audience. I would like to keep the scope of this project narrow but if you are really interested to go further, I found this video to be a lifesaver for my IS 6733 class while I was learning neural networks in grad school.

3. Application

Have you ever wondered how in the world your new iPhone has the capacity to determine whether or not it's your face with amazing accuracy in order to unlock using Face ID? Well you have just witnessed the power of neural networks first hand. Apple trained the neural networks using billions of images then took the model and installed it into your iPhone. Pretty cool, huh?

Since neural networks are great at determining whose face that image belongs to, what if we could wield this technology to find my celebrity doppelgänger.

In my much younger (and more athletic days), some of my friends would say I resembled soccer superstar Neymar while others would disagree. Well, let's use a neural network to settle the debate for us. Let's feed it a bunch of pictures for different individuals, including Neymar, and have the CNN decide who I resemble the most.

To make this worthwhile, let's use 7 different athletes and see what the model returns. If I indeed look like Neymar then when I feed it an image of myself then, in theory, it will output a prediction of Neymar. Sounds like a plan, right? All right, let's roll.

3.1 Loading the Data

So first let's load the data. We'll be using Google's Colab cloud product to leverage the powerful GPU available thus slashing runtimes instead of waiting forever for it to run locally on my MacBook.

To make this a novel project, I acquired the data from Google Images after being downloaded and cropped by hand. Since variety is the spice of life and, more importantly, to reduce the chances of a spurious result, the data is made up of images for different sport superstars that include Neymar, Leonel Messi, Cristiano Ronaldo, Kevin Durant, Odell Beckham, Steph Curry and Arron Rodgers.

The data was split 80/20 for training and validation purposes.

In [0]:
#loading google drive to access data. 
from google.colab import drive
drive.mount('/content/gdrive')

from keras.preprocessing.image import ImageDataGenerator
from keras import backend as K
from keras.callbacks import TensorBoard
K.set_image_dim_ordering('th')
from time import time
batch_size = 64

train_datagen = ImageDataGenerator(data_format="channels_last")

test_datagen = ImageDataGenerator(data_format="channels_last")

train_generator = train_datagen.flow_from_directory(
        '/content/gdrive/My Drive/CNN/data/train',  # this is the target directory
        target_size=(197, 197),  # all images will be resized to 197x197
        batch_size=batch_size,
        color_mode='rgb',
        class_mode='categorical')  

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        '/content/gdrive/My Drive/CNN/data/validation',
        target_size=(197, 197),
        batch_size=batch_size,
        color_mode='rgb',
        class_mode='categorical')

label_map = validation_generator.class_indices
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive
Found 166 images belonging to 7 classes.
Found 42 images belonging to 7 classes.

3.2 Transfer Learning

Instead of creating a CNN from scratch and having to use valuable time and resources training/testing different models, what we can do instead is use a model that has already been trained then slightly customize it to fit our needs. The formal term for this technique is called "transfer learning".

Transfer learning is great because you can take a neural network and its respective weights, painstakingly developed by the pros, then reuse those models for other purposes.

I'll be using Keras primarily because of the way it simplifies long cumbersome TensorFlow code into a few simple lines and because Keras comes pre-trained models available to execute transfer learning.

Today, we'll be using a CNN developed by AI researchers at Google called MobileNetV2. While most of the available models have comparable accuracy rates, this model is "lightweight" with its fewer total parameters in the model and thus faster to train. Here's an illustration of the architecture: mobilenetv2

For the most part, we will be using the same structure except for the input layer to be for an image of 197x197x3, adding a fully-connected layer and configuring the output layer to be for 7 classes.

In [0]:
#importing the relevant libraries and modules for CNN
from keras.applications.mobilenet_v2 import MobileNetV2
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input, Dropout
from keras import backend as K
from keras.callbacks import TensorBoard
import keras

input_tensor = Input(shape=(197, 197, 3))

# create the base pre-trained model
base_model = MobileNetV2(input_tensor=input_tensor, weights='imagenet', include_top=False)


# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(512, activation='relu')(x)
# since we have 7 classes, we need a final layer that predicts 7 classes. 
predictions = Dense(7, activation='softmax')(x)

custom_model = Model(inputs=base_model.input, outputs=predictions)

# compile the model
custom_model.compile(optimizer=keras.optimizers.Adam(lr=0.00009), loss='categorical_crossentropy',metrics=['accuracy'])

tensorboard = TensorBoard(log_dir="/content/gdrive/My Drive/CNN/tensorBoard/".format(time()), write_images=True, write_graph=True)

custom_model.fit_generator(
        train_generator,
        steps_per_epoch=10,
        callbacks = [tensorboard],
        epochs=15,
        validation_data=validation_generator,
        validation_steps=10,
        verbose=1)

#Save resulting model weights for future use
custom_model.save("/content/gdrive/My Drive/CNN/custom_model_weights.h5")

#Save custom model architecture
with open('/content/gdrive/My Drive/CNN/custom_model_architecture.json', 'w') as f:
    f.write(custom_model.to_json())
/usr/local/lib/python3.6/dist-packages/keras_applications/mobilenet_v2.py:295: UserWarning: MobileNet shape is undefined. Weights for input shape(224, 224) will be loaded.
  warnings.warn('MobileNet shape is undefined.'
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/15
10/10 [==============================] - 290s 29s/step - loss: 1.2219 - acc: 0.6980 - val_loss: 1.2320 - val_acc: 0.6190
Epoch 2/15
10/10 [==============================] - 260s 26s/step - loss: 0.1809 - acc: 1.0000 - val_loss: 0.7974 - val_acc: 0.8095
Epoch 3/15
10/10 [==============================] - 251s 25s/step - loss: 0.0377 - acc: 1.0000 - val_loss: 0.6328 - val_acc: 0.8095
Epoch 4/15
10/10 [==============================] - 266s 27s/step - loss: 0.0139 - acc: 1.0000 - val_loss: 0.5646 - val_acc: 0.8571
Epoch 5/15
10/10 [==============================] - 263s 26s/step - loss: 0.0077 - acc: 1.0000 - val_loss: 0.5361 - val_acc: 0.8571
Epoch 6/15
10/10 [==============================] - 252s 25s/step - loss: 0.0057 - acc: 1.0000 - val_loss: 0.5191 - val_acc: 0.8810
Epoch 7/15
10/10 [==============================] - 258s 26s/step - loss: 0.0044 - acc: 1.0000 - val_loss: 0.5077 - val_acc: 0.8810
Epoch 8/15
10/10 [==============================] - 263s 26s/step - loss: 0.0033 - acc: 1.0000 - val_loss: 0.4984 - val_acc: 0.8810
Epoch 9/15
10/10 [==============================] - 257s 26s/step - loss: 0.0033 - acc: 1.0000 - val_loss: 0.4921 - val_acc: 0.8810
Epoch 10/15
10/10 [==============================] - 270s 27s/step - loss: 0.0030 - acc: 1.0000 - val_loss: 0.4880 - val_acc: 0.8810
Epoch 11/15
10/10 [==============================] - 260s 26s/step - loss: 0.0023 - acc: 1.0000 - val_loss: 0.4829 - val_acc: 0.8810
Epoch 12/15
10/10 [==============================] - 251s 25s/step - loss: 0.0023 - acc: 1.0000 - val_loss: 0.4782 - val_acc: 0.8810
Epoch 13/15
10/10 [==============================] - 258s 26s/step - loss: 0.0021 - acc: 1.0000 - val_loss: 0.4747 - val_acc: 0.8810
Epoch 14/15
10/10 [==============================] - 258s 26s/step - loss: 0.0019 - acc: 1.0000 - val_loss: 0.4709 - val_acc: 0.8810
Epoch 15/15
10/10 [==============================] - 250s 25s/step - loss: 0.0018 - acc: 1.0000 - val_loss: 0.4679 - val_acc: 0.8810

For those not familiar with the output, we want to look at the last column labeled "val_acc". The number outputted is the percentage of correctly labeled images in decimal form for the validation set. You'll notice that the accuracy of the model rapidly increases after each iteration. That's because the model is leveraging the features it previously learned and putting them to use to make sense of the images that are being feed. You are witnessing the power of transfer learning in action.

After 15 iterations, you'll notice the model settles at about an 88% accuracy rate for classifying images to their respective classes. Transfer learning allows us to get a respectable classification rate with a relatively small dataset and few iterations.

3.3 CNN Applied

So now with a fully trained model at hand, we can use deep learning to help us in our quest for truth. If Neymar is indeed my doppelgänger then when I feed the model some images of myself, then it should output a higher probability comparted to the rest of the classes. The converse would imply that we'll have to look elsewhere for my doppelgänger.

In [0]:
#To make the evaluation of the images simpler, a function was created to display the results
from matplotlib.pyplot import imshow
import matplotlib.pyplot as plt

    
def model_output(prediction,label_map):    
    results = {list(label_map.keys())[i]:prediction[0][i] for i in range(len(prediction[0]))}
    for key in sorted(results, key=lambda k: results[k],reverse=True):
        print('%12s'%key,round(results[key]*100,2),"%")
    print()

def predict_image(image_path,model,label_map):
    img = image.load_img(image_path, target_size=(197, 197))
    imshow(img, interpolation='none')
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = x.reshape(1,197,197,3)
    x = preprocess_input(x)
    probabilities = model.predict(x)
    model_output(probabilities,label_map)


predict_image("data/julio1.jpg",custom_model,label_map)
      Neymar 41.63 %
     Ronaldo 17.9 %
       Messi 16.09 %
     Rodgers 13.66 %
     Beckham 6.95 %
      Irving 2.39 %
       Curry 1.37 %

In [0]:
predict_image("data/julio2.jpg",custom_model,label_map)
      Neymar 38.91 %
     Ronaldo 26.25 %
       Messi 14.74 %
     Rodgers 12.49 %
     Beckham 4.07 %
      Irving 2.07 %
       Curry 1.46 %

In [0]:
predict_image("data/julio3.jpg",custom_model,label_map)
     Ronaldo 32.5 %
     Rodgers 23.36 %
      Neymar 20.84 %
       Messi 15.39 %
     Beckham 4.93 %
      Irving 1.76 %
       Curry 1.22 %

In [0]:
predict_image("data/julio4.jpg",custom_model,label_map)
      Neymar 47.19 %
     Ronaldo 18.63 %
       Messi 15.06 %
     Rodgers 13.06 %
     Beckham 3.14 %
      Irving 1.96 %
       Curry 0.97 %

4. Conclusion

And there you have it folks. For 3 of 4 images, the model outputted a higher probability that the image was Neymar while only 1 image was predicted to be Ronaldo. Pro-Neymar camps seems to come out on top for now but not without raising a good point.

While the model correctly classified the images in the validation set 88% of the time, it wasn't perfect. A number of factors like image lighting, facial hair/different hairstyle, wearing of hats can impact model accuracy. This highlights the importance of, not just having a good amount of data, but of having high quality data that has these twists and turns that will provide a more robust model. Generally, neural networks require a large amount of data to work their magic. Fortunately, there is something called data augmentation that can help in acquiring more data when none is available. The idea is simple: get an image then modify in several different ways to have different variations of that same image. At a basic level, you can flip, crop, scale or translate the images to get "new" images that forces your model to learn the true features of the image. Here's an example: basic augmentation

You can also go a step further with more advanced data augmentation techniques such as conditional GANs or gaussian noise addition. advanced augmentation

In summary, neural networks are powerful algorithms for particular tasks when implemented correctly. But the key here is when implemented correctly. In comparisons to other machine learning methods, neural networks are hard to get right because they have such a large number of parameters to adjust compounded by the fact of the infinite model architectures you can implement. As aforementioned with respect to data size and data quality, you also have to take into account computing time and power. It falls in line with the No Free Lunch Theorem, where no algorithm will win out all the time. Nevertheless, I hope this project highlights how great of a tool neural networks are to have in your back pocket for modern problems.

Thanks for reading.

© 2019 julio gonzalez. Powered by hard work, creativity and coffee.