# 2D CNN in TensorFlow 2.0 on CIFAR-10 – Object Recognition in Images

## What is CNN

This Notebook demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images.

`Convolutional Neural Networks`

(ConvNets or CNNs) are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. Unlike traditional multilayer perceptron architectures, it uses two operations called `convolution`

and `pooling`

to reduce an image into its essential features, and uses those features to understand and classify the image.

## Important Terms of CNN

## Convolution Layer

Convolution is the first layer to extract features from an input image. Convolution preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as `image matrix`

and a `filter`

or `kernel`

.Then the convolution of image matrix multiplies with filter matrix which is called `Feature Map`

.

Convolution of an image with different filters can perform operations such as edge detection, blur and sharpen by applying filters.

## Activation Function

Since convolution is a linear operation, and images are far from linear, nonlinearity layers are often placed directly after the convolution layer to introduce `nonlinearity`

to the activation map.

There are several types of nonlinear operations, the popular ones being:

`Sigmoid`

: The sigmoid nonlinearity has the mathematical form f(x) = 1 / 1 + exp(-x). It takes a real-valued number and squeezes it into a range between 0 and 1. Sigmoid suffers a `vanishing gradient`

problem, which is a phenomenon when a local gradient becomes very small and backpropagation leads to killing of the gradient.

`Tanh`

: Tanh squashes a real-valued number to the range [-1, 1]. Like sigmoid, the activation saturates, but unlike the sigmoid neurons, its output is `zero-centered`

.

`ReLU`

: The Rectified Linear Unit (ReLU) computes the function ƒ(κ)=max (0,κ). In other words, the activation is simply threshold at zero. In comparison to sigmoid and tanh, ReLU is more reliable and accelerates the convergence by six times.

`Leaky ReL`

:Leaky ReLU function is nothing but an improved version of the ReLU function. Leaky ReLU is defined to address this problem. Instead of defining the Relu function as 0 for negative values of x, we define it as an extremely small linear component of x.

`Maxout`

:The Maxout activation is a generalization of the ReLU and the leaky ReLU functions. It is a learnable activation function.

`ELU`

:`Exponential Linear Unit`

or ELU for short is also a variant of Rectiufied Linear Unit (ReLU) that modifies the slope of the negative part of the function.Unlike the leaky relu and parametric ReLU functions, instead of a straight line, ELU uses a log curve for defning the negatice values.

## Filter | Kernel Size | Number of Filters

Convolution is using a `kernel`

to extract certain `features`

from an input image.A kernel is a matrix, which is `slide`

across the image and multiplied with the input such that the output is enhanced in a certain desirable manner.

Before we dive into it, a kernel is a matrix of weights which are multiplied with the input to extract relevant features. The dimensions of the kernel matrix is how the convolution gets it’s name. For example, in `2D convolutions`

, the kernel matrix is a `2D matrix`

.

A `filter`

however is a concatenation of `multiple kernels`

, each kernel assigned to a particular channel of the input. Filters are always one dimension more than the kernels. For example, in 2D convolutions, filters are 3D matrices. So for a CNN layer with kernel dimensions h*w and input channels k, the filter dimensions are k*h*w.

A common convolution layer actually consist of multiple such filters.

## Stride Size

`Stride`

is the number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on. The below figure shows convolution would work with a stride of 1.

## Padding

`padding`

means giving additional pixels at the boundary of the data.Sometimes filter does not perfectly fit the input image then we will be using padding.

We have two options:

- Pad the picture with zeros (zero-padding) so that it fits
- Drop the part of the image where the filter did not fit. This is called valid padding which keeps only valid part of the image.

## Pooling Layer

A `pooling layer`

is a new layer added after the convolutional layer. Specifically, after a nonlinearity (e.g. ReLU) has been applied to the feature maps output by a convolutional layer;

Pooling layers section would reduce the number of parameters when the images are too large. `Spatial pooling`

also called `subsampling`

or `downsampling`

which reduces the dimensionality of each map but retains important information.

Spatial pooling can be of different types:

- Max Pooling
- Average Pooling
- Sum Pooling

`Max pooling`

takes the largest element from the rectified feature map. Calculate the average value for each patch on the feature map is called as `average pooling`

. Sum of all elements for each patch in the feature map call as `sum pooling`

.

## Flattening and Dense Layer

`Flattening`

is converting the data into a 1-dimensional array for inputting it to the next layer. We flatten the output of the convolutional layers to create a single long feature vector. And it is connected to the final classification model, which is called a `fully-connected layer`

.

`Fully connected layer`

: A traditional multilayer perceptron structure. Its input is a one-dimensional vector representing the output of the previous layers. Its output is a list of probabilities for different possible labels attached to the image (e.g. dog, cat, bird). The label that receives the highest probability is the classification decision.

## Download Data and Model Building

!pip install tensorflow

!pip install mlxtend

import tensorflow as tf from tensorflow.keras import Sequential from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPool2D, Dropout print(tf.__version__)

2.1.1

import numpy as np import matplotlib.pyplot as plt import matplotlib from tensorflow.keras.datasets import cifar10

The `CIFAR10`

dataset contains 60,000 color images in `10 classes`

, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170500096/170498071 [==============================] - 50s 0us/step

classes_name = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

X_train.max()

255

X_train = X_train/255 X_test = X_test/255 X_train.shape, X_test.shape

((50000, 32, 32, 3), (10000, 32, 32, 3))

## Verify the data

To verify that the dataset looks correct, let’s plot the first images from the test set and display the image.

plt.imshow(X_test[0])

<matplotlib.image.AxesImage at 0x7fc1e4167ed0>

y_test

array([[3], [8], [8], ..., [5], [1], [7]], dtype=uint8)

## Build CNN Model

The 8 lines of code below define the convolutional base using a common pattern: a stack of `Conv2D`

,`MaxPooling2D`

, `Dropout`

,`Flatten`

and `Dense`

layers.

As input, a `Conv2D`

takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size.In this example, you will configure our conv2D to process inputs of shape (32, 32, 3), which is the format of CIFAR images.

`Maxpool2D()`

layer `Downsamples`

the input representation by taking the maximum value over the window defined by `pool_size`

(2,2) for each dimension along the features axis. The window is shifted by `strides`

(2) in each dimension. The resulting output when using `"valid"`

padding option has a shape.

`Dropout`

() is used to by randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

`Flatten()`

is used to convert the data into a 1-dimensional array for inputting it to the next layer.

`Dense()`

layer is the regular deeply connected neural network layer with 128 neurons. The output layer is also a dense layer with 10 neurons for the 10 classes.

The activation function used is `softmax`

. Softmax converts a real vector to a vector of categorical probabilities. The elements of the output vector are in range (0, 1) and sum to 1. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution.

model = Sequential() model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu', input_shape = [32, 32, 3])) model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu')) model.add(MaxPool2D(pool_size=(2,2), strides=2, padding='valid')) model.add(Dropout(0.5)) model.add(Flatten()) model.add(Dense(units = 128, activation='relu')) model.add(Dense(units=10, activation='softmax'))

model.summary()

Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 32) 896 _________________________________________________________________ conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0 _________________________________________________________________ dropout (Dropout) (None, 16, 16, 32) 0 _________________________________________________________________ flatten (Flatten) (None, 8192) 0 _________________________________________________________________ dense (Dense) (None, 128) 1048704 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 1,060,138 Trainable params: 1,060,138 Non-trainable params: 0 _________________________________________________________________

## Compile and train the model

Here we are `compiling`

the model and `fitting`

it to the training data. We will use 10 `epochs`

to train the model. An epoch is an iteration over the entire data provided. `validation_data`

is the data on which to evaluate the `loss`

and any model metrics at the end of each epoch. The model will not be trained on this data. As metrics = `['sparse_categorical_accuracy']`

the model will be evaluated based on the `accuracy`

.

model.compile(optimizer='adam', loss = 'sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

history = model.fit(X_train, y_train, batch_size=10, epochs=10, verbose=1, validation_data=(X_test, y_test))

Train on 50000 samples, validate on 10000 samples Epoch 1/10 50000/50000 [==============================] - 177s 4ms/sample - loss: 1.4127 - sparse_categorical_accuracy: 0.4918 - val_loss: 1.1079 - val_sparse_categorical_accuracy: 0.6095 Epoch 2/10 50000/50000 [==============================] - 159s 3ms/sample - loss: 1.1058 - sparse_categorical_accuracy: 0.6091 - val_loss: 1.0284 - val_sparse_categorical_accuracy: 0.6377 Epoch 3/10 50000/50000 [==============================] - 146s 3ms/sample - loss: 0.9946 - sparse_categorical_accuracy: 0.6477 - val_loss: 0.9682 - val_sparse_categorical_accuracy: 0.6564

We will now plot the `model accuracy`

and `model loss`

. In model accuracy we will plot the training accuracy and validation accuracy and in model loss we will plot the training loss and validation loss.

# Plot training & validation accuracy values epoch_range = range(1, 11) plt.plot(epoch_range, history.history['sparse_categorical_accuracy']) plt.plot(epoch_range, history.history['val_sparse_categorical_accuracy']) plt.title('Model accuracy') plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.legend(['Train', 'Val'], loc='upper left') plt.show() # Plot training & validation loss values plt.plot(epoch_range, history.history['loss']) plt.plot(epoch_range, history.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['Train', 'Val'], loc='upper left') plt.show() test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)

10000/10000 - 5s - loss: 0.9383 - sparse_categorical_accuracy: 0.6830

from mlxtend.plotting import plot_confusion_matrix from sklearn.metrics import confusion_matrix

y_pred = model.predict_classes(X_test)

y_pred

array([3, 8, 8, ..., 5, 1, 7])

y_test

array([[3], [8], [8], ..., [5], [1], [7]], dtype=uint8)

mat = confusion_matrix(y_test, y_pred) mat

array([[737, 27, 22, 17, 14, 4, 12, 14, 106, 47], [ 20, 821, 3, 12, 0, 7, 5, 4, 47, 81], [ 95, 8, 476, 97, 83, 110, 67, 30, 21, 13], [ 34, 14, 42, 520, 52, 203, 58, 43, 21, 13], [ 22, 4, 74, 118, 570, 69, 54, 66, 21, 2], [ 23, 5, 34, 213, 24, 610, 17, 47, 16, 11], [ 10, 8, 34, 80, 42, 40, 760, 8, 13, 5], [ 26, 5, 23, 45, 51, 76, 5, 743, 12, 14], [ 56, 41, 9, 10, 3, 4, 3, 2, 843, 29], [ 43, 116, 5, 18, 6, 4, 5, 21, 32, 750]])

plot_confusion_matrix(mat,figsize=(9,9), class_names=classes_name, show_normed=True)

(<Figure size 648x648 with 1 Axes>, <matplotlib.axes._subplots.AxesSubplot at 0x7fc12758d910>)

Conclusion:

In this tutorial we are have trained the simple Convolutional Neural Network (CNN) to classify CIFAR images.From the plot of learning curve we have observed that after 3 epoch the validation accuracy is less than the training set accuracy that refers to that our model is overfitting , which means we have increased the complexity of model. Also evaluated the model using confusion matrix. Observed that the model has predicted lower accuracy for bird, cat, deer, dog etc.. labels.