Credit Card Fraud Detection using CNN
Classification using CNN
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In this project we are going to build a model using CNN which predicts if the transaction is genuine or fraudelent.
Dataset
We are going to use the Credit Card Fraud Detection
Dataset from kaggle. It contains anonymized credit card transactions labeled as fraudulent or genuine. You can download it from here.
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. The only features which have not been transformed with PCA are 'Time'
and 'Amount'
. Feature 'Time'
contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount'
is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class'
is the response variable and it takes value 1 in case of fraud and 0 otherwise.
Tensorflow Installation
We are going to use tensorflow
to build the model. You can install tensorflow
by running this command. If you machine has a GPU you can use the second command.
!pip install tensorflow
!pip install tensorflow-gpu
The necessary python libraries are imported here-
Tensorflow
is used to build the neural network.- We have even imported all the layers required to build the model from
keras
. numpy
is used to perform basic array operationspandas
for loading and manipulating the data.pyplot
from matplotlib is used to visualize the results.train_test_split
is used to split the data into training and testing datasets.StandardScaler
is used to scale the values in the data.
import tensorflow as tf from tensorflow import keras from tensorflow.keras import Sequential from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization from tensorflow.keras.layers import Conv1D, MaxPool1D from tensorflow.keras.optimizers import Adam print(tf.__version__)
2.1.0
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler
Now we will read the dataset using read_csv()
in a pandas
dataframe.
data = pd.read_csv('creditcard.csv') data.head()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | ... | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | -1.359807 | -0.072781 | 2.536347 | 1.378155 | -0.338321 | 0.462388 | 0.239599 | 0.098698 | 0.363787 | ... | -0.018307 | 0.277838 | -0.110474 | 0.066928 | 0.128539 | -0.189115 | 0.133558 | -0.021053 | 149.62 | 0 |
1 | 0.0 | 1.191857 | 0.266151 | 0.166480 | 0.448154 | 0.060018 | -0.082361 | -0.078803 | 0.085102 | -0.255425 | ... | -0.225775 | -0.638672 | 0.101288 | -0.339846 | 0.167170 | 0.125895 | -0.008983 | 0.014724 | 2.69 | 0 |
2 | 1.0 | -1.358354 | -1.340163 | 1.773209 | 0.379780 | -0.503198 | 1.800499 | 0.791461 | 0.247676 | -1.514654 | ... | 0.247998 | 0.771679 | 0.909412 | -0.689281 | -0.327642 | -0.139097 | -0.055353 | -0.059752 | 378.66 | 0 |
3 | 1.0 | -0.966272 | -0.185226 | 1.792993 | -0.863291 | -0.010309 | 1.247203 | 0.237609 | 0.377436 | -1.387024 | ... | -0.108300 | 0.005274 | -0.190321 | -1.175575 | 0.647376 | -0.221929 | 0.062723 | 0.061458 | 123.50 | 0 |
4 | 2.0 | -1.158233 | 0.877737 | 1.548718 | 0.403034 | -0.407193 | 0.095921 | 0.592941 | -0.270533 | 0.817739 | ... | -0.009431 | 0.798278 | -0.137458 | 0.141267 | -0.206010 | 0.502292 | 0.219422 | 0.215153 | 69.99 | 0 |
5 rows × 31 columns
The dataset has 284807 rows and 31 columns.
data.shape
(284807, 31)
Now we will see if any null values are present in the data.
data.isnull().sum()
Time 0 V1 0 V2 0 V3 0 V4 0 V5 0 V6 0 V7 0 V8 0 V9 0 V10 0 V11 0 V12 0 V13 0 V14 0 V15 0 V16 0 V17 0 V18 0 V19 0 V20 0 V21 0 V22 0 V23 0 V24 0 V25 0 V26 0 V27 0 V28 0 Amount 0 Class 0 dtype: int64
As no null values are present we can go ahead and get other information of the data from data.info()
. We can see that the values are either float
or int
.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 284807 entries, 0 to 284806 Data columns (total 31 columns): Time 284807 non-null float64 V1 284807 non-null float64 V2 284807 non-null float64 V3 284807 non-null float64 V4 284807 non-null float64 V5 284807 non-null float64 V6 284807 non-null float64 V7 284807 non-null float64 V8 284807 non-null float64 V9 284807 non-null float64 V10 284807 non-null float64 V11 284807 non-null float64 V12 284807 non-null float64 V13 284807 non-null float64 V14 284807 non-null float64 V15 284807 non-null float64 V16 284807 non-null float64 V17 284807 non-null float64 V18 284807 non-null float64 V19 284807 non-null float64 V20 284807 non-null float64 V21 284807 non-null float64 V22 284807 non-null float64 V23 284807 non-null float64 V24 284807 non-null float64 V25 284807 non-null float64 V26 284807 non-null float64 V27 284807 non-null float64 V28 284807 non-null float64 Amount 284807 non-null float64 Class 284807 non-null int64 dtypes: float64(30), int64(1) memory usage: 67.4 MB
value_counts()
returns a Series
containing counts of unique values. This data has 2 classes 0 and 1. We can see that data with label 0
is a lot higher than data with label 1
. Hence this data is highly unbalanced
.
data['Class'].value_counts()
0 284315 1 492 Name: Class, dtype: int64
Balance Dataset
Here we will create a variable non_fraud
which will contain the data of all the genuine transactions i.e. the transactions with ['Class']==0
. fraud
will contain the data of all the fraudulent transactions i.e. the transactions with ['Class']==1
. The shape
attribute tells us that non_fraud
has 284315 rows and 31 columns and fraud
has 492 rows and 31 columns.
non_fraud = data[data['Class']==0] fraud = data[data['Class']==1] non_fraud.shape, fraud.shape
((284315, 31), (492, 31))
To balance the data we will select 492 transactions randomly from non_fraud
.Now you can see that non_fraud
has 492 rows.
non_fraud = non_fraud.sample(fraud.shape[0]) non_fraud.shape
(492, 31)
Now we will create the new balanced dataset by appending non_fraud
to fraud
. As ignore_index=True
the resulting axis will be labeled 0, 1, …, n - 1.
data = fraud.append(non_fraud, ignore_index=True) data.head()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | ... | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 406.0 | -2.312227 | 1.951992 | -1.609851 | 3.997906 | -0.522188 | -1.426545 | -2.537387 | 1.391657 | -2.770089 | ... | 0.517232 | -0.035049 | -0.465211 | 0.320198 | 0.044519 | 0.177840 | 0.261145 | -0.143276 | 0.00 | 1 |
1 | 472.0 | -3.043541 | -3.157307 | 1.088463 | 2.288644 | 1.359805 | -1.064823 | 0.325574 | -0.067794 | -0.270953 | ... | 0.661696 | 0.435477 | 1.375966 | -0.293803 | 0.279798 | -0.145362 | -0.252773 | 0.035764 | 529.00 | 1 |
2 | 4462.0 | -2.303350 | 1.759247 | -0.359745 | 2.330243 | -0.821628 | -0.075788 | 0.562320 | -0.399147 | -0.238253 | ... | -0.294166 | -0.932391 | 0.172726 | -0.087330 | -0.156114 | -0.542628 | 0.039566 | -0.153029 | 239.93 | 1 |
3 | 6986.0 | -4.397974 | 1.358367 | -2.592844 | 2.679787 | -1.128131 | -1.706536 | -3.496197 | -0.248778 | -0.247768 | ... | 0.573574 | 0.176968 | -0.436207 | -0.053502 | 0.252405 | -0.657488 | -0.827136 | 0.849573 | 59.00 | 1 |
4 | 7519.0 | 1.234235 | 3.019740 | -4.304597 | 4.732795 | 3.624201 | -1.357746 | 1.713445 | -0.496358 | -1.282858 | ... | -0.379068 | -0.704181 | -0.656805 | -1.632653 | 1.488901 | 0.566797 | -0.010016 | 0.146793 | 1.00 | 1 |
5 rows × 31 columns
data['Class'].value_counts()
1 492 0 492 Name: Class, dtype: int64
Now we will separate the feature space and the class. X
will contain the feature space and y
will contain the class label.
X = data.drop('Class', axis = 1) y = data['Class']
Now we will split the data into training and testing set with the help of train_test_split()
. test_size = 0.2
will keep 20% data for testing and 80% data will be used for training the model. random_state
controls the shuffling applied to the data before applying the split. stratify = y
means that the data is split in a stratified fashion, using y
as the class labels.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, stratify = y)
We can see that there are 787 samples for training and 197 samples for testing.
X_train.shape, X_test.shape
((787, 30), (197, 30))
Now we are going to get the bring the data into the same range. StandardScaler()
standardizes the features by removing the mean and scaling to unit variance. We will fit
scaler
only to the training dataset but we will tranform
both the training as well as the testing dataset.
scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) y_train = y_train.to_numpy() y_test = y_test.to_numpy() X_train.shape
(787, 30)
Our data is 2 dimensional but neural networks accept 3 dimensional data. So we have to reshape()
the data.
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1) X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1) X_train.shape, X_test.shape
((787, 30, 1), (197, 30, 1))
Build CNN
A Sequential()
model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.
Conv1D()
is a 1D Convolution Layer, this layer is very effective for deriving features from a fixed-length segment of the overall dataset, where it is not so important where the feature is located in the segment. In the first Conv1D()
layer we are learning a total of 32 filters
with size of convolutional window as 2. The input_shape
specifies the shape of the input. It is a necessary parameter for the first layer in any neural network. We will be using ReLu
activation function. The rectified linear activation function or ReLU
for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero.
BatchNormalization()
allows each layer of a network to learn by itself a little bit more independently of other layers. To increase the stability of a neural network, batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. It applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.
Dropout()
is used to randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.
Flatten()
is used to convert the data into a 1-dimensional array for inputting it to the next layer.
Dense()
is the regular deeply connected neural network layer. The output layer is also a dense layer with 1 neuron because we are predicting a single value as this is a binary classification problem. Sigmoid
function is used because it exists between (0 to 1) and this facilitates us to predict a binary input.
epochs = 20 model = Sequential() model.add(Conv1D(32, 2, activation='relu', input_shape = X_train[0].shape)) model.add(BatchNormalization()) model.add(Dropout(0.2)) model.add(Conv1D(64, 2, activation='relu')) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid'))
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv1d (Conv1D) (None, 29, 32) 96 _________________________________________________________________ batch_normalization (BatchNo (None, 29, 32) 128 _________________________________________________________________ dropout (Dropout) (None, 29, 32) 0 _________________________________________________________________ conv1d_1 (Conv1D) (None, 28, 64) 4160 _________________________________________________________________ batch_normalization_1 (Batch (None, 28, 64) 256 _________________________________________________________________ dropout_1 (Dropout) (None, 28, 64) 0 _________________________________________________________________ flatten (Flatten) (None, 1792) 0 _________________________________________________________________ dense (Dense) (None, 64) 114752 _________________________________________________________________ dropout_2 (Dropout) (None, 64) 0 _________________________________________________________________ dense_1 (Dense) (None, 1) 65 ================================================================= Total params: 119,457 Trainable params: 119,265 Non-trainable params: 192 _________________________________________________________________
Now we will compile
and fit
the model. We are using Adam
optimizer with 0.00001 learning rate
. We will use 20 epochs
to train the model. An epoch is an iteration over the entire data provided.validation_data
is the data on which to evaluate the loss and any model metrics at the end of each epoch. As metrics = ['accuracy']
the model will be evaluated based on the accuracy
.
model.compile(optimizer=Adam(lr=0.0001), loss = 'binary_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=epochs, validation_data=(X_test, y_test), verbose=1)
Train on 787 samples, validate on 197 samples Epoch 15/20 787/787 [==============================] - 0s 397us/sample - loss: 0.2179 - accuracy: 0.9365 - val_loss: 0.2355 - val_accuracy: 0.8985 Epoch 16/20 787/787 [==============================] - 0s 359us/sample - loss: 0.2070 - accuracy: 0.9276 - val_loss: 0.2271 - val_accuracy: 0.8985 Epoch 17/20 787/787 [==============================] - 0s 379us/sample - loss: 0.2030 - accuracy: 0.9314 - val_loss: 0.2206 - val_accuracy: 0.8985 Epoch 18/20 787/787 [==============================] - 0s 329us/sample - loss: 0.2192 - accuracy: 0.9276 - val_loss: 0.2189 - val_accuracy: 0.9036 Epoch 19/20 787/787 [==============================] - 0s 368us/sample - loss: 0.1896 - accuracy: 0.9352 - val_loss: 0.2180 - val_accuracy: 0.8985 Epoch 20/20 787/787 [==============================] - 0s 399us/sample - loss: 0.2067 - accuracy: 0.9199 - val_loss: 0.2183 - val_accuracy: 0.8934
Now we will visualize the results.
def plot_learningCurve(history, epoch): # Plot training & validation accuracy values epoch_range = range(1, epoch+1) plt.plot(epoch_range, history.history['accuracy']) plt.plot(epoch_range, history.history['val_accuracy']) plt.title('Model accuracy') plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.legend(['Train', 'Val'], loc='upper left') plt.show() # Plot training & validation loss values plt.plot(epoch_range, history.history['loss']) plt.plot(epoch_range, history.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['Train', 'Val'], loc='upper left') plt.show()
plot_learningCurve(history, epochs)
We can see that the training accuracy is higher than the validation accuracy. So we can say that they model is overfitting. We can add a MaxPool
layer and increase the nuber of epochs to improve our accuracy.
Adding MaxPool
epochs = 50 model = Sequential() model.add(Conv1D(32, 2, activation='relu', input_shape = X_train[0].shape)) model.add(BatchNormalization()) model.add(MaxPool1D(2)) model.add(Dropout(0.2)) model.add(Conv1D(64, 2, activation='relu')) model.add(BatchNormalization()) model.add(MaxPool1D(2)) model.add(Dropout(0.5)) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer=Adam(lr=0.0001), loss = 'binary_crossentropy', metrics=['accuracy']) history = model.fit(X_train, y_train, epochs=epochs, validation_data=(X_test, y_test), verbose=1)
Train on 787 samples, validate on 197 samples Epoch 45/50 787/787 [==============================] - 0s 211us/sample - loss: 0.2494 - accuracy: 0.9187 - val_loss: 0.2509 - val_accuracy: 0.9137 Epoch 46/50 787/787 [==============================] - 0s 212us/sample - loss: 0.2390 - accuracy: 0.9136 - val_loss: 0.2498 - val_accuracy: 0.9137 Epoch 47/50 787/787 [==============================] - 0s 225us/sample - loss: 0.2490 - accuracy: 0.9111 - val_loss: 0.2466 - val_accuracy: 0.9137 Epoch 48/50 787/787 [==============================] - 0s 210us/sample - loss: 0.2435 - accuracy: 0.9149 - val_loss: 0.2443 - val_accuracy: 0.9137 Epoch 49/50 787/787 [==============================] - 0s 192us/sample - loss: 0.2413 - accuracy: 0.9136 - val_loss: 0.2453 - val_accuracy: 0.9137 Epoch 50/50 787/787 [==============================] - 0s 194us/sample - loss: 0.2445 - accuracy: 0.9123 - val_loss: 0.2449 - val_accuracy: 0.9137
Now we will again visualize the results.
plot_learningCurve(history, epochs)
We can clearly see that we have got a better result after re-training our model with a few changes.
0 Comments