#cnn#convolutional neural network#Image Classification#ImageNet#Keras#pretrained model#roshan#Tensorflow#VGG#VGG16

Image Classification with Pre-trained VGG-16

Classify objects using the pre-trained VGG-16 model in Keras. Covers VGG architecture, loading ImageNet weights, image preprocessing, and top-5 predictions.

May 23, 2026 at 3:00 PM4 min readFollowFollow (Hindi)

Topics You Will Master

VGG-16 architecture: 13 convolutional blocks and 3 fully-connected layers
Loading ImageNet pre-trained weights via Keras applications API
Image preprocessing: resizing, array conversion, and preprocess_input
Forward pass inference and decode_predictions for top-5 class labels
Visualizing feature maps from intermediate VGG-16 layers
Best For

Developers learning to deploy pre-trained models for image recognition tasks.

Expected Outcome

A working VGG-16 pipeline delivering accurate top-5 predictions on new images.

How to use Pre-trained VGG16 models to predict object

VGG-16 is a 16-layer deep CNN architecture that won the 2014 ImageNet competition using only 3×3 convolution filters stacked in increasing depth, achieving 92.7% top-5 accuracy. This tutorial uses Keras's pre-trained VGG-16 with ImageNet weights to classify new images via forward-pass inference.

Importing Libraries

PYTHON
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing.image import load_img, img_to_array

import os
python
#creating an object for VGG16 model(pre-trained)

model = VGG16()
OUTPUT
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
553467904/553467096 [==============================] - 255s 0us/step
PYTHON
model.summary()
PYTHON
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

In the below steps, we are performing following activities :

  • loading 8 sample images from the disk
  • Converting the image to array and then reshaping it.
  • After performig the above steps, we are pre-process it and then predicting the output.
  • top=2 in decode_predictions() function means which we are taking top 2 probability values for the particular prediction.
PYTHON
#Here we are taking sample images and predicting the same images on top of pre-trained VGG16 model.
#top=2 in decode_predictions() function means which we are taking top 2 probability values for the particular prediction.

for file in os.listdir('sample'):
    print(file)
    full_path = 'sample/' + file

    image = load_img(full_path, target_size=(224, 224))
    image = img_to_array(image)
    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
    image = preprocess_input(image)
    y_pred = model.predict(image)
    label = decode_predictions(y_pred, top = 2)
    print(label)
    print()
OUTPUT
bottle1.jpeg
[[('n04557648', 'water_bottle', 0.6603951), ('n04560804', 'water_jug', 0.08577988)]]

bottle2.jpeg
[[('n04557648', 'water_bottle', 0.5169559), ('n04560804', 'water_jug', 0.2630159)]]

bottle3.jpeg
[[('n04557648', 'water_bottle', 0.88239855), ('n04560804', 'water_jug', 0.051655706)]]

monitor.jpeg
[[('n03782006', 'monitor', 0.46309018), ('n03179701', 'desk', 0.16822667)]]

mouse.jpeg
[[('n03793489', 'mouse', 0.37214068), ('n03657121', 'lens_cap', 0.1903602)]]

mug.jpeg
[[('n03063599', 'coffee_mug', 0.46725288), ('n03950228', 'pitcher', 0.1496518)]]

pen.jpeg
[[('n02783161', 'ballpoint', 0.6506707), ('n04116512', 'rubber_eraser', 0.12477029)]]

wallet.jpeg
[[('n04026417', 'purse', 0.530347), ('n04548362', 'wallet', 0.24484588)]]

Challenges Of VGG 16:

  • It is very slow to train (the original VGG model was trained on Nvidia Titan GPU for 2-3 weeks).
  • The size of VGG-16 trained imageNet weights is 528 MB. So, it takes quite a lot of disk space and bandwidth that makes it inefficient.

Conclusion

In this tutorial you used Keras's pre-trained VGG-16 with ImageNet weights to classify 8 sample images through a single forward pass — no training required. The model correctly identified water bottles, a computer monitor, a mouse, a coffee mug, a ballpoint pen, and a wallet, with top-1 confidence ranging from 37% (mouse) to 88% (bottle3).

Key takeaways:

  • Pre-trained VGG-16 weights are loaded via VGG16() with default weights="imagenet" — this downloads 528 MB of weights once and caches them, making inference as simple as calling model.predict() without any fine-tuning.
  • preprocess_input() must be applied after converting an image to a NumPy array — it subtracts the ImageNet per-channel mean (BGR format) to match the normalization used during VGG's original training.
  • decode_predictions(y_pred, top=2) maps the 1000-class softmax output to human-readable ImageNet labels and confidence scores; top=2 returns the two most likely classes per image.
  • VGG-16's 138M parameters make it accurate but heavy (528 MB, 2-3 weeks to train from scratch on GPU) — for production, MobileNet or EfficientNet achieve similar accuracy at a fraction of the size.

Next steps:

  • Adapt VGG-16 for a custom dataset via transfer learning by replacing the final Dense(1000) layer with Dense(num_classes) and fine-tuning only the top layers.
  • Compare against multi-label classification in Multi-Label Movie Poster Classification with CNN to see how custom CNNs compare with pre-trained backbones.
  • Visualize intermediate feature maps from block3_pool or block5_pool to understand what spatial features VGG-16 learns at each depth.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments