2D CNN on CIFAR-10 with TensorFlow 2.0

Convolutional Neural Networks (CNNs) use learned filters to extract spatial features from images, making them the dominant architecture for object recognition. This tutorial builds a multi-block 2D CNN in TensorFlow 2.0 with Conv2D, MaxPooling, and Dropout layers, training it to classify 10 object categories from the CIFAR-10 dataset.

Download Data and Model Building

BASH

!pip install tensorflow
!pip install mlxtend

PYTHON

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPool2D, Dropout
print(tf.__version__)

OUTPUT

2.1.1

PYTHON

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from tensorflow.keras.datasets import cifar10

The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.

PYTHON

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

OUTPUT

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 50s 0us/step

PYTHON

classes_name = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

PYTHON

X_train.max()

OUTPUT

PYTHON

X_train = X_train/255
X_test = X_test/255
X_train.shape, X_test.shape

OUTPUT

((50000, 32, 32, 3), (10000, 32, 32, 3))

Verify the data

Plot the first test image to confirm the data loaded correctly:

PYTHON

plt.imshow(X_test[0])

PYTHON

y_test

OUTPUT

array([[3],
       [8],
       [8],
       ...,
       [5],
       [1],
       [7]], dtype=uint8)

Build CNN Model

The 8 lines of code below define the convolutional base using a common pattern: a stack of Conv2D ,MaxPooling2D , Dropout,Flatten and Dense layers.

As input, a Conv2D takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size.In this example, you will configure our conv2D to process inputs of shape (32, 32, 3), which is the format of CIFAR images.

Maxpool2D() layer Downsamples the input representation by taking the maximum value over the window defined by pool_size(2,2) for each dimension along the features axis. The window is shifted by strides(2) in each dimension. The resulting output when using "valid" padding option has a shape.

Dropout() is used to by randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

Flatten() is used to convert the data into a 1-dimensional array for inputting it to the next layer.

Dense() layer is the regular deeply connected neural network layer with 128 neurons. The output layer is also a dense layer with 10 neurons for the 10 classes.

The activation function used is softmax. Softmax converts a real vector to a vector of categorical probabilities. The elements of the output vector are in range (0, 1) and sum to 1. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution.

PYTHON

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu', input_shape = [32, 32, 3]))

model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu'))
model.add(MaxPool2D(pool_size=(2,2), strides=2, padding='valid'))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(units = 128, activation='relu'))
model.add(Dense(units=10, activation='softmax'))

PYTHON

model.summary()

PYTHON

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 32, 32, 32)        896
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 32)        9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0
_________________________________________________________________
dropout (Dropout)            (None, 16, 16, 32)        0
_________________________________________________________________
flatten (Flatten)            (None, 8192)              0
_________________________________________________________________
dense (Dense)                (None, 128)               1048704
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290
=================================================================
Total params: 1,060,138
Trainable params: 1,060,138
Non-trainable params: 0
_________________________________________________________________

Compile and train the model

Here we are compiling the model and fitting it to the training data. We will use 10 epochs to train the model. An epoch is an iteration over the entire data provided. validation_data is the data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. As metrics = ['sparse_categorical_accuracy'] the model will be evaluated based on the accuracy.

PYTHON

model.compile(optimizer='adam', loss = 'sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

PYTHON

history = model.fit(X_train, y_train, batch_size=10, epochs=10, verbose=1, validation_data=(X_test, y_test))

OUTPUT

Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 177s 4ms/sample - loss: 1.4127 - sparse_categorical_accuracy: 0.4918 - val_loss: 1.1079 - val_sparse_categorical_accuracy: 0.6095
Epoch 2/10
50000/50000 [==============================] - 159s 3ms/sample - loss: 1.1058 - sparse_categorical_accuracy: 0.6091 - val_loss: 1.0284 - val_sparse_categorical_accuracy: 0.6377
Epoch 3/10
50000/50000 [==============================] - 146s 3ms/sample - loss: 0.9946 - sparse_categorical_accuracy: 0.6477 - val_loss: 0.9682 - val_sparse_categorical_accuracy: 0.6564

We will now plot the model accuracy and model loss. In model accuracy we will plot the training accuracy and validation accuracy and in model loss we will plot the training loss and validation loss.

PYTHON

# Plot training & validation accuracy values
epoch_range = range(1, 11)
plt.plot(epoch_range, history.history['sparse_categorical_accuracy'])
plt.plot(epoch_range, history.history['val_sparse_categorical_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(epoch_range, history.history['loss'])
plt.plot(epoch_range, history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

test_loss, test_acc = model.evaluate(X_test,  y_test, verbose=2)

OUTPUT

10000/10000 - 5s - loss: 0.9383 - sparse_categorical_accuracy: 0.6830

PYTHON

from mlxtend.plotting import plot_confusion_matrix
from sklearn.metrics import confusion_matrix

PYTHON

y_pred = model.predict_classes(X_test)

PYTHON

y_pred

OUTPUT

array([3, 8, 8, ..., 5, 1, 7])

PYTHON

y_test

OUTPUT

array([[3],
       [8],
       [8],
       ...,
       [5],
       [1],
       [7]], dtype=uint8)

PYTHON

mat = confusion_matrix(y_test, y_pred)
mat

OUTPUT

array([[737,  27,  22,  17,  14,   4,  12,  14, 106,  47],
       [ 20, 821,   3,  12,   0,   7,   5,   4,  47,  81],
       [ 95,   8, 476,  97,  83, 110,  67,  30,  21,  13],
       [ 34,  14,  42, 520,  52, 203,  58,  43,  21,  13],
       [ 22,   4,  74, 118, 570,  69,  54,  66,  21,   2],
       [ 23,   5,  34, 213,  24, 610,  17,  47,  16,  11],
       [ 10,   8,  34,  80,  42,  40, 760,   8,  13,   5],
       [ 26,   5,  23,  45,  51,  76,   5, 743,  12,  14],
       [ 56,  41,   9,  10,   3,   4,   3,   2, 843,  29],
       [ 43, 116,   5,  18,   6,   4,   5,  21,  32, 750]])

PYTHON

plot_confusion_matrix(mat,figsize=(9,9), class_names=classes_name, show_normed=True)

Conclusion

In this tutorial you built a 2D CNN in TensorFlow 2.0 to classify images from the CIFAR-10 dataset into 10 object categories. After 10 epochs the model reached ~68% test accuracy, and the confusion matrix revealed that visually similar classes — especially bird, cat, deer, and dog — are the hardest to separate, while airplane and ship scored the highest.

Key takeaways:

A two-block Conv2D + MaxPool2D architecture with Dropout achieves solid baseline performance on CIFAR-10.
Validation accuracy diverging from training accuracy after epoch 3 is a clear sign of overfitting — reduce model complexity or add stronger regularization.
The confusion matrix exposes per-class weaknesses far better than a single accuracy number; always inspect it when classes are visually similar.

Next steps:

Try Dog vs Cat Classification with CNN to apply a deeper VGG16-inspired architecture with ImageDataGenerator augmentation.
Explore Image Classification with Pre-trained VGG-16 to see how transfer learning dramatically boosts accuracy on image tasks.
Add BatchNormalization after each convolutional block and tune Dropout rate to reduce the overfitting gap observed in this model.

2D CNN on CIFAR-10 with TensorFlow 2.0

Topics You Will Master

Download Data and Model Building

Verify the data

Build CNN Model

Compile and train the model

Conclusion

Latest recommendations you might like

IMDB Sentiment Classification with LSTM

Sentiment Classification Using BERT

Find this tutorial useful?

Discussion & Comments