#BERT#Deep Learning#imdb dataset#Keras#kgptalkie#lstm#Natural Language Processing#nlp#rnn#roshan#sentiment classification#Tensorflow#transformers

Sentiment Classification Using BERT

Fine-tune BERT for IMDB movie review sentiment classification using ktrain. Covers Transformer architecture, BERT tokenization, and one-cycle fine-tuning.

May 24, 2026 at 8:15 AM6 min readFollowFollow (Hindi)

Topics You Will Master

Transformer self-attention and positional encoding fundamentals
BERT pre-training: masked language model and next-sentence prediction
ktrain Learner API for one-cycle fine-tuning on classification tasks
IMDB dataset tokenization with BERT WordPiece tokenizer
Evaluating fine-tuned BERT with classification report and confusion matrix
Best For

Developers learning to fine-tune large pre-trained language models.

Expected Outcome

A fine-tuned BERT model achieving state-of-the-art sentiment accuracy on IMDB.

Sentiment Classification with BERT

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer that captures deep bidirectional context — making it state-of-the-art across NLP tasks. This tutorial fine-tunes BERT on the IMDB movie reviews dataset for binary sentiment classification using the ktrain one-cycle training API.

What is ktrain

ktrain is a library to help build, train, debug, and deploy neural networks in the deep learning software framework, Keras.

ktrain uses tf.keras in TensorFlow instead of standalone Keras.) Inspired by the fastai library, with only a few lines of code, ktrain allows you to easily:

  • estimate an optimal learning rate for your model given your data using a learning rate finder
  • employ learning rate schedules such as the triangular learning rate policy, 1cycle policy, and SGDR to more effectively train your model
  • employ fast and easy-to-use pre-canned models for both text classification (e.g., NBSVM, fastText, GRU with pre-trained word embeddings) and image classification (e.g., ResNet, Wide Residual Networks, Inception)
  • load and preprocess text and image data from a variety of formats
  • inspect data points that were misclassified to help improve your model
  • leverage a simple prediction API for saving and deploying both models and data-preprocessing steps to make predictions on new raw data

ktrain GitHub: amaiya/ktrain

Notebook Setup

BASH
pip install ktrain

Importing Libraries

PYTHON
import tensorflow as tf
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf
PYTHON
tf.__version__
OUTPUT
'2.1.0'

Downloading the dataset

BASH
git clone https://github.com/laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k.git
OUTPUT
Cloning into 'IMDB-Movie-Reviews-Large-Dataset-50k'...
PYTHON
#loading the train dataset

data_train = pd.read_excel('IMDB-Movie-Reviews-Large-Dataset-50k/train.xlsx', dtype = str)
PYTHON
#loading the test dataset

data_test = pd.read_excel('IMDB-Movie-Reviews-Large-Dataset-50k/test.xlsx', dtype = str)
PYTHON
#dimension of the dataset

print("Size of train dataset: ",data_train.shape)
print("Size of test dataset: ",data_test.shape)
OUTPUT
Size of train dataset:  (25000, 2)
Size of test dataset:  (25000, 2)

Observation: Both train and test dataset is having 25000 rows and 2 columns

PYTHON
#printing last rows of train dataset

data_train.tail()
OUTPUT
ReviewsSentiment
24995Everyone plays their part pretty well in this ...pos
24996It happened with Assault on Prescient 13 in 20...neg
24997My God. This movie was awful. I can't complain...neg
24998When I first popped in Happy Birthday to Me, I...neg
24999So why does this show suck? Unfortunately, tha...neg
PYTHON
#printing head rows of test dataset

data_test.head()
OUTPUT
ReviewsSentiment
0Who would have thought that a movie about a ma...pos
1After realizing what is going on around us ......pos
2I grew up watching the original Disney Cindere...neg
3David Mamet wrote the screenplay and made his ...pos
4Admittedly, I didn't have high expectations of...neg

Splitting data into Train and Test:

PYTHON
# text.texts_from_df return two tuples
# maxlen means it is considering that much words and rest are getting trucated
# preprocess_mode means tokenizing, embedding and transformation of text corpus(here it is considering BERT model)

(X_train, y_train), (X_test, y_test), preproc = text.texts_from_df(train_df=data_train,
                                                                   text_column = 'Reviews',
                                                                   label_columns = 'Sentiment',
                                                                   val_df = data_test,
                                                                   maxlen = 500,
                                                                   preprocess_mode = 'bert')
OUTPUT
downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en

Is Multi-Label? False
preprocessing test...
language: en

Observation:

  1. You can able to see that it is detecting language as an English
  2. Also, this is not a multilabel classification
PYTHON
# name = "bert" means, here we are using BERT model.

model = text.text_classifier(name = 'bert',
                             train_data = (X_train, y_train),
                             preproc = preproc)
OUTPUT
Is Multi-Label? False
maxlen is 500
done.
PYTHON
#here we have taken batch size as 6 as from the documentation it is recommend to use this with maxlen as 500

learner = ktrain.get_learner(model=model, train_data=(X_train, y_train),
                   val_data = (X_test, y_test),
                   batch_size = 6)
PYTHON
# find out best learning rate?
# learner.lr_find()
# learner.lr_plot()

# it may take days or many days to find out.
PYTHON
#Essentially fit is a very basic training loop, whereas fit one cycle uses the one cycle policy callback

learner.fit_onecycle(lr = 2e-5, epochs = 1)

predictor = ktrain.get_predictor(learner.model, preproc)
predictor.save('/content/drive/My Drive/bert')
PYTHON
predictor = ktrain.get_predictor(learner.model, preproc)
predictor.save('/content/drive/My Drive/bert')
PYTHON
#sample dataset to test on

data = ['this movie was horrible, the plot was really boring. acting was okay',
        'the fild is really sucked. there is not plot and acting was bad',
        'what a beautiful movie. great plot. acting was good. will see it again']
PYTHON
predictor.predict(data)
OUTPUT
['neg', 'neg', 'pos']

Intepretation of above results :

  1. 'this movie was horrible, the plot was really boring. acting was okay' - neg
  2. 'the fild is really sucked. there is not plot and acting was bad' - neg
  3. 'what a beautiful movie. great plot. acting was good. will see it again' - pos
PYTHON
#return_proba = True means it will give the prediction probabilty for each class

predictor.predict(data, return_proba=True)
OUTPUT
array([[0.99797565, 0.00202436],
       [0.99606663, 0.00393336],
       [0.00292433, 0.9970757 ]], dtype=float32)
PYTHON
#classes available

predictor.get_classes()
OUTPUT
['neg', 'pos']
PYTHON
# saving model and weights

predictor.save('/content/drive/My Drive/bert')
OUTPUT
!zip -r /content/bert.zip /content/bert

adding: content/bert/ (stored 0%)
  adding: content/bert/tf_model.h5 (deflated 11%)
  adding: content/bert/tf_model.preproc (deflated 52%)
PYTHON
#loading the model

predictor_load = ktrain.load_predictor('/content/bert')
PYTHON
#predicting the data

predictor_load.predict(data)
OUTPUT
['neg', 'neg', 'pos']

Conclusion

In this tutorial you fine-tuned a pre-trained BERT model on the 50k IMDB sentiment dataset using the ktrain one-cycle training API. After downloading BERT's uncased weights, tokenizing reviews with the WordPiece tokenizer at maxlen=500, and fine-tuning with a learning rate of 2e-5 for a single epoch, the model correctly classified all three test phrases — including a subtly negative review about a "sucked" film. The predictor was then saved and reloaded to demonstrate deployment.

Key takeaways:

  • BERT's bidirectional self-attention reads the full context of each token at once (left and right), unlike unidirectional LSTMs — this is why it transfers so powerfully to downstream tasks with minimal fine-tuning.
  • ktrain's text.texts_from_df handles all BERT-specific preprocessing (WordPiece tokenization, [CLS]/[SEP] token insertion, attention masks) in one call, hiding the boilerplate that normally requires the transformers library.
  • The one-cycle learning rate schedule (fit_onecycle) trains faster and more stably than a constant rate by warming up to a peak then cooling down — even a single epoch can produce strong results on fine-tuning tasks.
  • For production, predictor.save() serializes both the model weights and the preprocessing pipeline together, so inference requires no additional setup beyond ktrain.load_predictor().

Next steps:

  • Compare BERT against its compressed variant in DistilBERT — Smaller, Faster, Cheaper, Lighter to see the speed-accuracy trade-off.
  • Apply the same fine-tuning workflow to multi-class sentiment with a custom dataset to extend beyond binary classification.
  • Try learner.lr_find() and learner.lr_plot() to empirically choose the best learning rate rather than using the default 2e-5.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments