Credit Card Fraud Detection using CNN

Classification using CNN

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In this project we are going to build a model using CNN which predicts if the transaction is genuine or fraudelent.

Dataset

We are going to use the Credit Card Fraud Detection Dataset from kaggle. It contains anonymized credit card transactions labeled as fraudulent or genuine. You can download it from here.

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. The only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Tensorflow Installation

We are going to use tensorflow to build the model. You can install tensorflow by running this command. If you machine has a GPU you can use the second command.

BASH

!pip install tensorflow
!pip install tensorflow-gpu

PYTHON

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D
from tensorflow.keras.optimizers import Adam
print(tf.__version__)

OUTPUT

2.1.0

PYTHON

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Now we will read the dataset using read_csv() in a pandas dataframe.

PYTHON

data = pd.read_csv('creditcard.csv')
data.head()

OUTPUT

	Time	V1	V2	V3	V4	V5	V6	V7	V8	V9	...	V21	V22	V23	V24	V25	V26	V27	V28	Amount
0	0.0	-1.359807	-0.072781	2.536347	1.378155	-0.338321	0.462388	0.239599	0.098698	0.363787	...	-0.018307	0.277838	-0.110474	0.066928	0.128539	-0.189115	0.133558	-0.021053	149.62
1	0.0	1.191857	0.266151	0.166480	0.448154	0.060018	-0.082361	-0.078803	0.085102	-0.255425	...	-0.225775	-0.638672	0.101288	-0.339846	0.167170	0.125895	-0.008983	0.014724	2.69
2	1.0	-1.358354	-1.340163	1.773209	0.379780	-0.503198	1.800499	0.791461	0.247676	-1.514654	...	0.247998	0.771679	0.909412	-0.689281	-0.327642	-0.139097	-0.055353	-0.059752	378.66
3	1.0	-0.966272	-0.185226	1.792993	-0.863291	-0.010309	1.247203	0.237609	0.377436	-1.387024	...	-0.108300	0.005274	-0.190321	-1.175575	0.647376	-0.221929	0.062723	0.061458	123.50
4	2.0	-1.158233	0.877737	1.548718	0.403034	-0.407193	0.095921	0.592941	-0.270533	0.817739	...	-0.009431	0.798278	-0.137458	0.141267	-0.206010	0.502292	0.219422	0.215153	69.99

5 rows × 31 columns

The dataset has 284807 rows and 31 columns.

PYTHON

data.shape

OUTPUT

(284807, 31)

Now we will see if any null values are present in the data.

PYTHON

data.isnull().sum()

OUTPUT

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

As no null values are present we can go ahead and get other information of the data from data.info(). We can see that the values are either float or int.

PYTHON

data.info()

OUTPUT

RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
Time      284807 non-null float64
V1        284807 non-null float64
V2        284807 non-null float64
V3        284807 non-null float64
V4        284807 non-null float64
V5        284807 non-null float64
V6        284807 non-null float64
V7        284807 non-null float64
V8        284807 non-null float64
V9        284807 non-null float64
V10       284807 non-null float64
V11       284807 non-null float64
V12       284807 non-null float64
V13       284807 non-null float64
V14       284807 non-null float64
V15       284807 non-null float64
V16       284807 non-null float64
V17       284807 non-null float64
V18       284807 non-null float64
V19       284807 non-null float64
V20       284807 non-null float64
V21       284807 non-null float64
V22       284807 non-null float64
V23       284807 non-null float64
V24       284807 non-null float64
V25       284807 non-null float64
V26       284807 non-null float64
V27       284807 non-null float64
V28       284807 non-null float64
Amount    284807 non-null float64
Class     284807 non-null int64
dtypes: float64(30), int64(1)
memory usage: 67.4 MB

value_counts() returns a Series containing counts of unique values. This data has 2 classes 0 and 1. We can see that data with label 0 is a lot higher than data with label 1. Hence this data is highly unbalanced.

PYTHON

data['Class'].value_counts()

OUTPUT

0    284315
1       492
Name: Class, dtype: int64

Balance Dataset

Here we will create a variable non_fraud which will contain the data of all the genuine transactions i.e. the transactions with ['Class']==0. fraud will contain the data of all the fraudulent transactions i.e. the transactions with ['Class']==1. The shape attribute tells us that non_fraud has 284315 rows and 31 columns and fraud has 492 rows and 31 columns.

PYTHON

non_fraud = data[data['Class']==0]
fraud = data[data['Class']==1]
non_fraud.shape, fraud.shape

OUTPUT

((284315, 31), (492, 31))

To balance the data we will select 492 transactions randomly from non_fraud.Now you can see that non_fraud has 492 rows.

PYTHON

non_fraud = non_fraud.sample(fraud.shape[0])
non_fraud.shape

OUTPUT

(492, 31)

Now we will create the new balanced dataset by appending non_fraud to fraud. As ignore_index=True the resulting axis will be labeled 0, 1, …, n - 1.

PYTHON

data = fraud.append(non_fraud, ignore_index=True)
data.head()

OUTPUT

	Time	V1	V2	V3	V4	V5	V6	V7	V8	V9	...	V21	V22	V23	V24	V25	V26	V27	V28	Amount	Class
0	406.0	-2.312227	1.951992	-1.609851	3.997906	-0.522188	-1.426545	-2.537387	1.391657	-2.770089	...	0.517232	-0.035049	-0.465211	0.320198	0.044519	0.177840	0.261145	-0.143276	0.00	1
1	472.0	-3.043541	-3.157307	1.088463	2.288644	1.359805	-1.064823	0.325574	-0.067794	-0.270953	...	0.661696	0.435477	1.375966	-0.293803	0.279798	-0.145362	-0.252773	0.035764	529.00	1
2	4462.0	-2.303350	1.759247	-0.359745	2.330243	-0.821628	-0.075788	0.562320	-0.399147	-0.238253	...	-0.294166	-0.932391	0.172726	-0.087330	-0.156114	-0.542628	0.039566	-0.153029	239.93	1
3	6986.0	-4.397974	1.358367	-2.592844	2.679787	-1.128131	-1.706536	-3.496197	-0.248778	-0.247768	...	0.573574	0.176968	-0.436207	-0.053502	0.252405	-0.657488	-0.827136	0.849573	59.00	1
4	7519.0	1.234235	3.019740	-4.304597	4.732795	3.624201	-1.357746	1.713445	-0.496358	-1.282858	...	-0.379068	-0.704181	-0.656805	-1.632653	1.488901	0.566797	-0.010016	0.146793	1.00	1

5 rows × 31 columns

PYTHON

data['Class'].value_counts()

OUTPUT

1    492
0    492
Name: Class, dtype: int64

Now we will separate the feature space and the class. X will contain the feature space and y will contain the class label.

PYTHON

X = data.drop('Class', axis = 1)
y = data['Class']

Now we will split the data into training and testing set with the help of train_test_split(). test_size = 0.2 will keep 20% data for testing and 80% data will be used for training the model. random_state controls the shuffling applied to the data before applying the split. stratify = y means that the data is split in a stratified fashion, using y as the class labels.

PYTHON

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, stratify = y)

We can see that there are 787 samples for training and 197 samples for testing.

PYTHON

X_train.shape, X_test.shape

OUTPUT

((787, 30), (197, 30))

Now we are going to get the bring the data into the same range. StandardScaler() standardizes the features by removing the mean and scaling to unit variance. We will fit scaler only to the training dataset but we will tranform both the training as well as the testing dataset.

PYTHON

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

y_train = y_train.to_numpy()
y_test = y_test.to_numpy()

X_train.shape

OUTPUT

(787, 30)

Our data is 2 dimensional but neural networks accept 3 dimensional data. So we have to reshape() the data.

PYTHON

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

X_train.shape, X_test.shape

OUTPUT

((787, 30, 1), (197, 30, 1))

Build CNN

A Sequential() model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

Conv1D() is a 1D Convolution Layer, this layer is very effective for deriving features from a fixed-length segment of the overall dataset, where it is not so important where the feature is located in the segment. In the first Conv1D() layer we are learning a total of 32 filters with size of convolutional window as 2. The input_shape specifies the shape of the input. It is a necessary parameter for the first layer in any neural network. We will be using ReLu activation function. The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero.

BatchNormalization() allows each layer of a network to learn by itself a little bit more independently of other layers. To increase the stability of a neural network, batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. It applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

Dropout() is used to randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

Flatten() is used to convert the data into a 1-dimensional array for inputting it to the next layer.

Dense() is the regular deeply connected neural network layer. The output layer is also a dense layer with 1 neuron because we are predicting a single value as this is a binary classification problem. Sigmoid function is used because it exists between (0 to 1) and this facilitates us to predict a binary input.

PYTHON

epochs = 20
model = Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape = X_train[0].shape))
model.add(BatchNormalization())
model.add(Dropout(0.2))

model.add(Conv1D(64, 2, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))

PYTHON

model.summary()

PYTHON

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv1d (Conv1D)              (None, 29, 32)            96
_________________________________________________________________
batch_normalization (BatchNo (None, 29, 32)            128
_________________________________________________________________
dropout (Dropout)            (None, 29, 32)            0
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 28, 64)            4160
_________________________________________________________________
batch_normalization_1 (Batch (None, 28, 64)            256
_________________________________________________________________
dropout_1 (Dropout)          (None, 28, 64)            0
_________________________________________________________________
flatten (Flatten)            (None, 1792)              0
_________________________________________________________________
dense (Dense)                (None, 64)                114752
_________________________________________________________________
dropout_2 (Dropout)          (None, 64)                0
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65
=================================================================
Total params: 119,457
Trainable params: 119,265
Non-trainable params: 192
_________________________________________________________________

Now we will compile and fit the model. We are using Adam optimizer with 0.00001 learning rate. We will use 20 epochs to train the model. An epoch is an iteration over the entire data provided.validation_data is the data on which to evaluate the loss and any model metrics at the end of each epoch. As metrics = ['accuracy'] the model will be evaluated based on the accuracy.

PYTHON

model.compile(optimizer=Adam(lr=0.0001), loss = 'binary_crossentropy', metrics=['accuracy'])

PYTHON

history = model.fit(X_train, y_train, epochs=epochs, validation_data=(X_test, y_test), verbose=1)

OUTPUT

Train on 787 samples, validate on 197 samples
Epoch 15/20 787/787 [==============================] - 0s 397us/sample - loss: 0.2179 - accuracy: 0.9365 - val_loss: 0.2355 - val_accuracy: 0.8985
Epoch 16/20 787/787 [==============================] - 0s 359us/sample - loss: 0.2070 - accuracy: 0.9276 - val_loss: 0.2271 - val_accuracy: 0.8985
Epoch 17/20 787/787 [==============================] - 0s 379us/sample - loss: 0.2030 - accuracy: 0.9314 - val_loss: 0.2206 - val_accuracy: 0.8985
Epoch 18/20 787/787 [==============================] - 0s 329us/sample - loss: 0.2192 - accuracy: 0.9276 - val_loss: 0.2189 - val_accuracy: 0.9036
Epoch 19/20 787/787 [==============================] - 0s 368us/sample - loss: 0.1896 - accuracy: 0.9352 - val_loss: 0.2180 - val_accuracy: 0.8985
Epoch 20/20 787/787 [==============================] - 0s 399us/sample - loss: 0.2067 - accuracy: 0.9199 - val_loss: 0.2183 - val_accuracy: 0.8934

Now we will visualize the results.

PYTHON

def plot_learningCurve(history, epoch):
  # Plot training & validation accuracy values
  epoch_range = range(1, epoch+1)
  plt.plot(epoch_range, history.history['accuracy'])
  plt.plot(epoch_range, history.history['val_accuracy'])
  plt.title('Model accuracy')
  plt.ylabel('Accuracy')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

  # Plot training & validation loss values
  plt.plot(epoch_range, history.history['loss'])
  plt.plot(epoch_range, history.history['val_loss'])
  plt.title('Model loss')
  plt.ylabel('Loss')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

PYTHON

plot_learningCurve(history, epochs)

We can see that the training accuracy is higher than the validation accuracy. So we can say that they model is overfitting. We can add a MaxPool layer and increase the nuber of epochs to improve our accuracy.

Adding MaxPool

PYTHON

epochs = 50
model = Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape = X_train[0].shape))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.2))

model.add(Conv1D(64, 2, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer=Adam(lr=0.0001), loss = 'binary_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=epochs, validation_data=(X_test, y_test), verbose=1)

OUTPUT

Train on 787 samples, validate on 197 samples
Epoch 45/50 787/787 [==============================] - 0s 211us/sample - loss: 0.2494 - accuracy: 0.9187 - val_loss: 0.2509 - val_accuracy: 0.9137
Epoch 46/50 787/787 [==============================] - 0s 212us/sample - loss: 0.2390 - accuracy: 0.9136 - val_loss: 0.2498 - val_accuracy: 0.9137
Epoch 47/50 787/787 [==============================] - 0s 225us/sample - loss: 0.2490 - accuracy: 0.9111 - val_loss: 0.2466 - val_accuracy: 0.9137
Epoch 48/50 787/787 [==============================] - 0s 210us/sample - loss: 0.2435 - accuracy: 0.9149 - val_loss: 0.2443 - val_accuracy: 0.9137
Epoch 49/50 787/787 [==============================] - 0s 192us/sample - loss: 0.2413 - accuracy: 0.9136 - val_loss: 0.2453 - val_accuracy: 0.9137
Epoch 50/50 787/787 [==============================] - 0s 194us/sample - loss: 0.2445 - accuracy: 0.9123 - val_loss: 0.2449 - val_accuracy: 0.9137

Now we will again visualize the results.

PYTHON

plot_learningCurve(history, epochs)

We can clearly see that we have got a better result after re-training our model with a few changes.

Conclusion

In this tutorial you built two 1D CNN variants to detect credit card fraud from a highly imbalanced dataset. After under-sampling to balance the 492 fraud and 492 genuine transactions, the baseline model without MaxPool1D reached ~90% test accuracy but showed clear overfitting; adding MaxPool1D and extending to 50 epochs pushed validation accuracy to ~91.4% with a tighter train/val gap.

Key takeaways:

Under-sampling to balance classes is a fast starting point for imbalanced data, but it discards the majority of genuine transaction data — SMOTE oversampling is a better alternative for production models.
MaxPool1D reduces spatial resolution between convolution blocks, acting as a regularizer that helps prevent overfitting on small, balanced datasets.
The fraud detection task rewards high recall for class 1 (fraud) over raw accuracy — always examine precision/recall alongside the overall accuracy metric.

Next steps:

Apply the same 1D CNN approach to Bank Customer Satisfaction which has a larger dataset and similar tabular-to-CNN pipeline.
Compare this CNN approach against the ANN baseline in Building Your First ANN with TensorFlow 2.0.
Experiment with SMOTE oversampling instead of under-sampling to retain all 284,807 genuine transactions in training.

Credit Card Fraud Detection using CNN

Topics You Will Master

Classification using CNN

Dataset

Tensorflow Installation

Balance Dataset

Build CNN

Adding MaxPool

Conclusion

Latest recommendations you might like

IMDB Sentiment Classification with LSTM

Sentiment Classification Using BERT

Find this tutorial useful?

Discussion & Comments