Deep learning with Tensorflow
Artificial Neural Networks (ANNs) learn patterns from data by propagating activations through layers of weighted neurons, then adjusting weights via backpropagation to minimize a loss function. This tutorial builds a binary classification ANN in TensorFlow 2.0 and Keras, covering activation functions, optimizers, and evaluation metrics.
Installing libraries
# pip install tensorflow==2.0.0-rc0
# pip install tensorflow-gpu==2.0.0-rc0
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense
print(tf.__version__)
2.2.0
Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
dataset = pd.read_csv('Customer_Churn_Modelling.csv')
dataset.head()
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 15634602 | Hargrave | 619 | France | Female | 42 | 2 | 0.00 | 1 | 1 | 1 | 101348.88 | 1 |
| 1 | 2 | 15647311 | Hill | 608 | Spain | Female | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 | 0 |
| 2 | 3 | 15619304 | Onio | 502 | France | Female | 42 | 8 | 159660.80 | 3 | 1 | 0 | 113931.57 | 1 |
| 3 | 4 | 15701354 | Boni | 699 | France | Female | 39 | 1 | 0.00 | 2 | 0 | 0 | 93826.63 | 0 |
| 4 | 5 | 15737888 | Mitchell | 850 | Spain | Female | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.10 | 0 |
X = dataset.drop(labels=['CustomerId', 'Surname', 'RowNumber', 'Exited'], axis = 1)
y = dataset['Exited']
X.head()
| CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 619 | France | Female | 42 | 2 | 0.00 | 1 | 1 | 1 | 101348.88 |
| 1 | 608 | Spain | Female | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 |
| 2 | 502 | France | Female | 42 | 8 | 159660.80 | 3 | 1 | 0 | 113931.57 |
| 3 | 699 | France | Female | 39 | 1 | 0.00 | 2 | 0 | 0 | 93826.63 |
| 4 | 850 | Spain | Female | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.10 |
y.head()
0 1
1 0
2 1
3 0
4 0
Name: Exited, dtype: int64
Using label encoder we are converting categorical features to numerical features
from sklearn.preprocessing import LabelEncoder
label1 = LabelEncoder()
X['Geography'] = label1.fit_transform(X['Geography'])
label = LabelEncoder()
X['Gender'] = label.fit_transform(X['Gender'])
X.head()
| CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 619 | 0 | 0 | 42 | 2 | 0.00 | 1 | 1 | 1 | 101348.88 |
| 1 | 608 | 2 | 0 | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 |
| 2 | 502 | 0 | 0 | 42 | 8 | 159660.80 | 3 | 1 | 0 | 113931.57 |
| 3 | 699 | 0 | 0 | 39 | 1 | 0.00 | 2 | 0 | 0 | 93826.63 |
| 4 | 850 | 2 | 0 | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.10 |
X = pd.get_dummies(X, drop_first=True, columns=['Geography'])
X.head()
| CreditScore | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Geography_1 | Geography_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 619 | 0 | 42 | 2 | 0.00 | 1 | 1 | 1 | 101348.88 | 0 | 0 |
| 1 | 608 | 0 | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 | 0 | 1 |
| 2 | 502 | 0 | 42 | 8 | 159660.80 | 3 | 1 | 0 | 113931.57 | 0 | 0 |
| 3 | 699 | 0 | 39 | 1 | 0.00 | 2 | 0 | 0 | 93826.63 | 0 | 0 |
| 4 | 850 | 0 | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.10 | 0 | 1 |
- Here using standardscaler we are scaling our data, we are scaling such that the mean is 0 and variance is 1 for data
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, stratify = y)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train
array([[-1.24021723, -1.09665089, 0.77986083, ..., 1.64099027,
-0.57812007, -0.57504086],
[ 0.75974873, 0.91186722, -0.27382717, ..., -1.55587522,
1.72974448, -0.57504086],
[-1.72725557, -1.09665089, -0.9443559 , ..., 1.1038111 ,
-0.57812007, -0.57504086],
...,
[-0.51484098, 0.91186722, 0.87565065, ..., -1.01507508,
1.72974448, -0.57504086],
[ 0.73902369, -1.09665089, -0.36961699, ..., -1.47887193,
-0.57812007, -0.57504086],
[ 0.95663657, 0.91186722, -1.32751517, ..., 0.50945854,
-0.57812007, 1.73900686]])
Build ANN
- Here bwe are building ANN model.
- First we add input layer of shape of input that is 11 in this case.
- There is only one hidden layers whose shape is 128.
- Shape of Output layer is only 1 since we have only one output.
model = Sequential()
model.add(Dense(X.shape[1], activation='relu', input_dim = X.shape[1]))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation = 'sigmoid'))
X.shape[1]
11
- Here we are compiling our model. we have selected Adam optimizer. loss is binary crossentropy and metric is accuracy
model.compile(optimizer='adam', loss = 'binary_crossentropy', metrics=['accuracy'])
- Here we are fitting model on training dataset . we have given bacth size of 10 and eopchs are 10
model.fit(X_train, y_train.to_numpy(), batch_size = 10, epochs = 10, verbose = 1)
Epoch 1/10
800/800 [==============================] - 1s 2ms/step - loss: 0.4516 - accuracy: 0.8116
Epoch 2/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3948 - accuracy: 0.8372
Epoch 3/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3597 - accuracy: 0.8543
Epoch 4/10
800/800 [==============================] - 1s 2ms/step - loss: 0.3475 - accuracy: 0.8576
Epoch 5/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3426 - accuracy: 0.8611
Epoch 6/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3389 - accuracy: 0.8619
Epoch 7/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3366 - accuracy: 0.8625
Epoch 8/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3350 - accuracy: 0.8629
Epoch 9/10
800/800 [==============================] - 1s 2ms/step - loss: 0.3333 - accuracy: 0.8635
Epoch 10/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3311 - accuracy: 0.8634
- Using model.predict we predict output values for our input data.
y_pred = model.predict_classes(X_test)
y_pred
array([[0],
[0],
[0],
...,
[0],
[1],
[0]])
y_test
1344 1
8167 0
4747 0
5004 1
3124 1
..
9107 0
8249 0
8337 0
6279 1
412 0
Name: Exited, Length: 2000, dtype: int64
model.evaluate(X_test, y_test.to_numpy())
63/63 [==============================] - 0s 2ms/step - loss: 0.3489 - accuracy: 0.8520
[0.34891313314437866, 0.8519999980926514]
from sklearn.metrics import confusion_matrix, accuracy_score
Confusion matrix
confusion_matrix(y_test, y_pred)
array([[1546, 47],
[ 249, 158]], dtype=int64)
accuracy_score(y_test, y_pred)
0.852
Conclusion
In this tutorial you built a three-layer ANN in TensorFlow 2.0 to predict customer churn from the Bank Customer Churn dataset. After encoding categorical features with LabelEncoder and get_dummies, and standardizing with StandardScaler, the model trained for 10 epochs and reached 85.2% test accuracy — confirmed by both model.evaluate() and the confusion_matrix.
Key takeaways:
- Categorical features like geography and gender must be encoded before feeding into a neural network —
LabelEncoderplus one-hot encoding viaget_dummiesis the standard preprocessing pipeline. - A simple three-layer ANN (input → 128-unit hidden → sigmoid output) is a strong baseline for binary classification before moving to deeper or convolutional architectures.
- The confusion matrix reveals that the model identifies most non-churners correctly but misclassifies 249 churners — class imbalance handling would improve recall on the minority class.
Next steps:
- Extend to a 1D CNN for the same dataset type in Credit Card Fraud Detection using CNN to compare performance.
- Explore deeper ANN designs and regularization in TensorFlow 2.0 Getting Started.
- Add
Dropoutlayers between the hidden and output layers to reduce overfitting and improve generalization.
