#cnn#Tensorflow

2D CNN on CIFAR-10 with TensorFlow 2.0

Train a 2D Convolutional Neural Network on CIFAR-10 using TensorFlow 2.0. Covers Conv2D, MaxPooling, Dropout, model training, and confusion matrix evaluation.

May 18, 2026 at 10:30 AM6 min readFollowFollow (Hindi)

Topics You Will Master

Convolutional layer fundamentals: filters, strides, and padding
MaxPooling2D and Dropout for spatial down-sampling and regularization
Building multi-block CNN architectures with Keras Sequential API
CIFAR-10 data loading, normalization, and one-hot encoding
Confusion matrix and per-class accuracy evaluation
Best For

Developers learning image classification with convolutional neural networks.

Expected Outcome

A trained 2D CNN achieving solid object-recognition accuracy on CIFAR-10.

Convolutional Neural Networks (CNNs) use learned filters to extract spatial features from images, making them the dominant architecture for object recognition. This tutorial builds a multi-block 2D CNN in TensorFlow 2.0 with Conv2D, MaxPooling, and Dropout layers, training it to classify 10 object categories from the CIFAR-10 dataset.

Download Data and Model Building

BASH
!pip install tensorflow
!pip install mlxtend
PYTHON
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPool2D, Dropout
print(tf.__version__)
OUTPUT
2.1.1
PYTHON
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from tensorflow.keras.datasets import cifar10

The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.

PYTHON
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
OUTPUT
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 50s 0us/step
PYTHON
classes_name = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
PYTHON
X_train.max()
OUTPUT
255
PYTHON
X_train = X_train/255
X_test = X_test/255
X_train.shape, X_test.shape
OUTPUT
((50000, 32, 32, 3), (10000, 32, 32, 3))

Verify the data

Plot the first test image to confirm the data loaded correctly:

PYTHON
plt.imshow(X_test[0])
PYTHON
y_test
OUTPUT
array([[3],
       [8],
       [8],
       ...,
       [5],
       [1],
       [7]], dtype=uint8)

Build CNN Model

The 8 lines of code below define the convolutional base using a common pattern: a stack of Conv2D ,MaxPooling2D , Dropout,Flatten and Dense layers.

As input, a Conv2D takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size.In this example, you will configure our conv2D to process inputs of shape (32, 32, 3), which is the format of CIFAR images.

Maxpool2D() layer Downsamples the input representation by taking the maximum value over the window defined by pool_size(2,2) for each dimension along the features axis. The window is shifted by strides(2) in each dimension. The resulting output when using "valid" padding option has a shape.

Dropout() is used to by randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

Flatten() is used to convert the data into a 1-dimensional array for inputting it to the next layer.

Dense() layer is the regular deeply connected neural network layer with 128 neurons. The output layer is also a dense layer with 10 neurons for the 10 classes.

The activation function used is softmax. Softmax converts a real vector to a vector of categorical probabilities. The elements of the output vector are in range (0, 1) and sum to 1. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution.

PYTHON
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu', input_shape = [32, 32, 3]))

model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu'))
model.add(MaxPool2D(pool_size=(2,2), strides=2, padding='valid'))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(units = 128, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
PYTHON
model.summary()
PYTHON
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 32, 32, 32)        896
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 32)        9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0
_________________________________________________________________
dropout (Dropout)            (None, 16, 16, 32)        0
_________________________________________________________________
flatten (Flatten)            (None, 8192)              0
_________________________________________________________________
dense (Dense)                (None, 128)               1048704
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290
=================================================================
Total params: 1,060,138
Trainable params: 1,060,138
Non-trainable params: 0
_________________________________________________________________

Compile and train the model

Here we are compiling the model and fitting it to the training data. We will use 10 epochs to train the model. An epoch is an iteration over the entire data provided. validation_data is the data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. As metrics = ['sparse_categorical_accuracy'] the model will be evaluated based on the accuracy.

PYTHON
model.compile(optimizer='adam', loss = 'sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
PYTHON
history = model.fit(X_train, y_train, batch_size=10, epochs=10, verbose=1, validation_data=(X_test, y_test))
OUTPUT
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 177s 4ms/sample - loss: 1.4127 - sparse_categorical_accuracy: 0.4918 - val_loss: 1.1079 - val_sparse_categorical_accuracy: 0.6095
Epoch 2/10
50000/50000 [==============================] - 159s 3ms/sample - loss: 1.1058 - sparse_categorical_accuracy: 0.6091 - val_loss: 1.0284 - val_sparse_categorical_accuracy: 0.6377
Epoch 3/10
50000/50000 [==============================] - 146s 3ms/sample - loss: 0.9946 - sparse_categorical_accuracy: 0.6477 - val_loss: 0.9682 - val_sparse_categorical_accuracy: 0.6564

We will now plot the model accuracy and model loss. In model accuracy we will plot the training accuracy and validation accuracy and in model loss we will plot the training loss and validation loss.

PYTHON
# Plot training & validation accuracy values
epoch_range = range(1, 11)
plt.plot(epoch_range, history.history['sparse_categorical_accuracy'])
plt.plot(epoch_range, history.history['val_sparse_categorical_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(epoch_range, history.history['loss'])
plt.plot(epoch_range, history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

test_loss, test_acc = model.evaluate(X_test,  y_test, verbose=2)
OUTPUT
10000/10000 - 5s - loss: 0.9383 - sparse_categorical_accuracy: 0.6830
PYTHON
from mlxtend.plotting import plot_confusion_matrix
from sklearn.metrics import confusion_matrix
PYTHON
y_pred = model.predict_classes(X_test)
PYTHON
y_pred
OUTPUT
array([3, 8, 8, ..., 5, 1, 7])
PYTHON
y_test
OUTPUT
array([[3],
       [8],
       [8],
       ...,
       [5],
       [1],
       [7]], dtype=uint8)
PYTHON
mat = confusion_matrix(y_test, y_pred)
mat
OUTPUT
array([[737,  27,  22,  17,  14,   4,  12,  14, 106,  47],
       [ 20, 821,   3,  12,   0,   7,   5,   4,  47,  81],
       [ 95,   8, 476,  97,  83, 110,  67,  30,  21,  13],
       [ 34,  14,  42, 520,  52, 203,  58,  43,  21,  13],
       [ 22,   4,  74, 118, 570,  69,  54,  66,  21,   2],
       [ 23,   5,  34, 213,  24, 610,  17,  47,  16,  11],
       [ 10,   8,  34,  80,  42,  40, 760,   8,  13,   5],
       [ 26,   5,  23,  45,  51,  76,   5, 743,  12,  14],
       [ 56,  41,   9,  10,   3,   4,   3,   2, 843,  29],
       [ 43, 116,   5,  18,   6,   4,   5,  21,  32, 750]])
PYTHON
plot_confusion_matrix(mat,figsize=(9,9), class_names=classes_name, show_normed=True)

Conclusion

In this tutorial you built a 2D CNN in TensorFlow 2.0 to classify images from the CIFAR-10 dataset into 10 object categories. After 10 epochs the model reached ~68% test accuracy, and the confusion matrix revealed that visually similar classes — especially bird, cat, deer, and dog — are the hardest to separate, while airplane and ship scored the highest.

Key takeaways:

  • A two-block Conv2D + MaxPool2D architecture with Dropout achieves solid baseline performance on CIFAR-10.
  • Validation accuracy diverging from training accuracy after epoch 3 is a clear sign of overfitting — reduce model complexity or add stronger regularization.
  • The confusion matrix exposes per-class weaknesses far better than a single accuracy number; always inspect it when classes are visually similar.

Next steps:

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments