Convolutional Neural Networks (CNNs) use learned filters to extract spatial features from images, making them the dominant architecture for object recognition. This tutorial builds a multi-block 2D CNN in TensorFlow 2.0 with Conv2D, MaxPooling, and Dropout layers, training it to classify 10 object categories from the CIFAR-10 dataset.
Download Data and Model Building
!pip install tensorflow
!pip install mlxtend
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPool2D, Dropout
print(tf.__version__)
2.1.1
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from tensorflow.keras.datasets import cifar10
The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 50s 0us/step
classes_name = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
X_train.max()
255
X_train = X_train/255
X_test = X_test/255
X_train.shape, X_test.shape
((50000, 32, 32, 3), (10000, 32, 32, 3))
Verify the data
Plot the first test image to confirm the data loaded correctly:
plt.imshow(X_test[0])
y_test
array([[3],
[8],
[8],
...,
[5],
[1],
[7]], dtype=uint8)
Build CNN Model
The 8 lines of code below define the convolutional base using a common pattern: a stack of Conv2D ,MaxPooling2D , Dropout,Flatten and Dense layers.
As input, a Conv2D takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size.In this example, you will configure our conv2D to process inputs of shape (32, 32, 3), which is the format of CIFAR images.
Maxpool2D() layer Downsamples the input representation by taking the maximum value over the window defined by pool_size(2,2) for each dimension along the features axis. The window is shifted by strides(2) in each dimension. The resulting output when using "valid" padding option has a shape.
Dropout() is used to by randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.
Flatten() is used to convert the data into a 1-dimensional array for inputting it to the next layer.
Dense() layer is the regular deeply connected neural network layer with 128 neurons. The output layer is also a dense layer with 10 neurons for the 10 classes.
The activation function used is softmax. Softmax converts a real vector to a vector of categorical probabilities. The elements of the output vector are in range (0, 1) and sum to 1. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution.
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu', input_shape = [32, 32, 3]))
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu'))
model.add(MaxPool2D(pool_size=(2,2), strides=2, padding='valid'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(units = 128, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 32) 896
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 32) 9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 16, 16, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 8192) 0
_________________________________________________________________
dense (Dense) (None, 128) 1048704
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 1,060,138
Trainable params: 1,060,138
Non-trainable params: 0
_________________________________________________________________
Compile and train the model
Here we are compiling the model and fitting it to the training data. We will use 10 epochs to train the model. An epoch is an iteration over the entire data provided. validation_data is the data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. As metrics = ['sparse_categorical_accuracy'] the model will be evaluated based on the accuracy.
model.compile(optimizer='adam', loss = 'sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
history = model.fit(X_train, y_train, batch_size=10, epochs=10, verbose=1, validation_data=(X_test, y_test))
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 177s 4ms/sample - loss: 1.4127 - sparse_categorical_accuracy: 0.4918 - val_loss: 1.1079 - val_sparse_categorical_accuracy: 0.6095
Epoch 2/10
50000/50000 [==============================] - 159s 3ms/sample - loss: 1.1058 - sparse_categorical_accuracy: 0.6091 - val_loss: 1.0284 - val_sparse_categorical_accuracy: 0.6377
Epoch 3/10
50000/50000 [==============================] - 146s 3ms/sample - loss: 0.9946 - sparse_categorical_accuracy: 0.6477 - val_loss: 0.9682 - val_sparse_categorical_accuracy: 0.6564
We will now plot the model accuracy and model loss. In model accuracy we will plot the training accuracy and validation accuracy and in model loss we will plot the training loss and validation loss.
# Plot training & validation accuracy values
epoch_range = range(1, 11)
plt.plot(epoch_range, history.history['sparse_categorical_accuracy'])
plt.plot(epoch_range, history.history['val_sparse_categorical_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(epoch_range, history.history['loss'])
plt.plot(epoch_range, history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
10000/10000 - 5s - loss: 0.9383 - sparse_categorical_accuracy: 0.6830
from mlxtend.plotting import plot_confusion_matrix
from sklearn.metrics import confusion_matrix
y_pred = model.predict_classes(X_test)
y_pred
array([3, 8, 8, ..., 5, 1, 7])
y_test
array([[3],
[8],
[8],
...,
[5],
[1],
[7]], dtype=uint8)
mat = confusion_matrix(y_test, y_pred)
mat
array([[737, 27, 22, 17, 14, 4, 12, 14, 106, 47],
[ 20, 821, 3, 12, 0, 7, 5, 4, 47, 81],
[ 95, 8, 476, 97, 83, 110, 67, 30, 21, 13],
[ 34, 14, 42, 520, 52, 203, 58, 43, 21, 13],
[ 22, 4, 74, 118, 570, 69, 54, 66, 21, 2],
[ 23, 5, 34, 213, 24, 610, 17, 47, 16, 11],
[ 10, 8, 34, 80, 42, 40, 760, 8, 13, 5],
[ 26, 5, 23, 45, 51, 76, 5, 743, 12, 14],
[ 56, 41, 9, 10, 3, 4, 3, 2, 843, 29],
[ 43, 116, 5, 18, 6, 4, 5, 21, 32, 750]])
plot_confusion_matrix(mat,figsize=(9,9), class_names=classes_name, show_normed=True)
Conclusion
In this tutorial you built a 2D CNN in TensorFlow 2.0 to classify images from the CIFAR-10 dataset into 10 object categories. After 10 epochs the model reached ~68% test accuracy, and the confusion matrix revealed that visually similar classes — especially bird, cat, deer, and dog — are the hardest to separate, while airplane and ship scored the highest.
Key takeaways:
- A two-block Conv2D + MaxPool2D architecture with Dropout achieves solid baseline performance on CIFAR-10.
- Validation accuracy diverging from training accuracy after epoch 3 is a clear sign of overfitting — reduce model complexity or add stronger regularization.
- The confusion matrix exposes per-class weaknesses far better than a single accuracy number; always inspect it when classes are visually similar.
Next steps:
- Try Dog vs Cat Classification with CNN to apply a deeper VGG16-inspired architecture with ImageDataGenerator augmentation.
- Explore Image Classification with Pre-trained VGG-16 to see how transfer learning dramatically boosts accuracy on image tasks.
- Add
BatchNormalizationafter each convolutional block and tuneDropoutrate to reduce the overfitting gap observed in this model.
