Customer Satisfaction Prediction with CNN

Predict bank customer satisfaction using a 1D CNN in TensorFlow. Covers feature selection, StandardScaler, Conv1D layers, and binary classification training.

Aug 29, 2020Updated May 16, 202627 min readFollow

Topics You Will Master

Exploratory data analysis and feature importance ranking
StandardScaler normalization for tabular feature inputs
Conv1D layer design for 1-D structured data classification
Binary cross-entropy loss and Adam optimizer configuration

Feature Selection and CNN

This project builds a neural network to predict whether a particular bank customer is satisfied or not, using Convolutional Neural Networks. The dataset contains 370 features. Install TensorFlow with pip install tensorflow (or pip install tensorflow-gpu for GPU).

PYTHON
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import VarianceThreshold

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv1D, MaxPool1D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
print(tf.__version__)
OUTPUT
2.1.0

You can use this command to directly get the data from github.

PLAINTEXT
!git clone https://github.com/laxmimerit/Data-Files-for-Feature-Selection.git

After downloading the data, read it using read_csv(). To see the first 5 rows of the data use data.head().

PYTHON
data = pd.read_csv('train.csv')
data.head()
OUTPUT
IDvar3var15imp_ent_var16_ult1imp_op_var39_comer_ult1imp_op_var39_comer_ult3imp_op_var40_comer_ult1imp_op_var40_comer_ult3imp_op_var40_efect_ult1imp_op_var40_efect_ult3...saldo_medio_var33_hace2saldo_medio_var33_hace3saldo_medio_var33_ult1saldo_medio_var33_ult3saldo_medio_var44_hace2saldo_medio_var44_hace3saldo_medio_var44_ult1saldo_medio_var44_ult3var38TARGET
012230.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.039205.1700000
132340.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.049278.0300000
242230.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.067333.7700000
382370.0195.0195.00.00.00.00.0...0.00.00.00.00.00.00.00.064007.9700000
4102390.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.0117310.9790160

5 rows x 371 columns

The dataset has 76020 rows and 371 columns.

PYTHON
data.shape
OUTPUT
(76020, 371)

Create a feature space X containing only the columns that provide information for prediction. ID and TARGET do not contribute to prediction, so they are removed using drop(). After dropping these 2 columns, the column count reduces to 369.

PYTHON
X = data.drop(labels=['ID', 'TARGET'], axis = 1)
X.shape
OUTPUT
(76020, 369)

Create a variable y containing the values to predict, i.e. TARGET.

PYTHON
y = data['TARGET']

Split the data into training and testing sets with train_test_split(). test_size = 0.2 reserves 20% for testing and 80% for training. random_state controls the shuffling applied before the split. stratify = y means the split is done in a stratified fashion, using y as the class labels.

PYTHON
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 0, stratify = y)

The training dataset consists of 60816 rows (80%) and the testing dataset consists of 15204 rows (20%).

PYTHON
X_train.shape, X_test.shape
OUTPUT
((60816, 369), (15204, 369))

Remove Constant, Quasi Constant and Duplicate Features

Feature selection is the process of reducing the number of input variables when developing a predictive model.

  • Constant Features are the features that show single values in all the observations in the dataset. These features provide no information that allows ML models to predict the target.
  • Quasi constant features, as the name suggests, are the features that are almost constant. In other words, these features have the same values for a very large subset of the outputs. They have less variance. Such features are not very useful for making predictions.
  • Duplicate Features as the name suggests are duplicated in the dataset.

The variance threshold is set to 1%: any column with variance below 1% is removed, retaining only columns with variance above 99%. VarianceThreshold() is fit to the training data only, and the test data is only transformed.

PYTHON
filter = VarianceThreshold(0.01)
X_train = filter.fit_transform(X_train)
X_test = filter.transform(X_test)

X_train.shape, X_test.shape
OUTPUT
((60816, 273), (15204, 273))

After removing the Quasi constant features, 96 features are removed from the dataset.

PLAINTEXT
369-273
PLAINTEXT
96

To remove duplicate features, the data is transposed using .T, since Python has built-in functions to check for duplicate rows. After transposing, the shape of X_train_T is exactly opposite to that of X_train.

PYTHON
X_train_T = X_train.T
X_test_T = X_test.T

X_train_T = pd.DataFrame(X_train_T)
X_test_T = pd.DataFrame(X_test_T)

X_train_T.shape
OUTPUT
(273, 60816)

.duplicated() returns a boolean Series denoting duplicate rows. 17 features are duplicated.

PYTHON
X_train_T.duplicated().sum()
OUTPUT
17

The list of duplicated features below shows those with index True as duplicated.

PYTHON
duplicated_features = X_train_T.duplicated()
duplicated_features[70:90]
OUTPUT
70    False
71    False
72     True
73    False
74     True
75    False
76    False
77    False
78    False
79    False
80    False
81    False
82    False
83    False
84    False
85    False
86    False
87    False
88    False
89    False
dtype: bool

The features with False values are not duplicated and should be retained. Inverting the boolean list changes False to True and vice versa.

PYTHON
features_to_keep = [not index for index in duplicated_features]
features_to_keep[70:90]
OUTPUT
[True, True, False, True, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]

With the values inverted, the features marked True are retained. The data is transposed again to restore the original shape. Applied to X_train:

PYTHON
X_train = X_train_T[features_to_keep].T
X_train.shape
OUTPUT
(60816, 256)

Applied to X_test:

PYTHON
X_test = X_test_T[features_to_keep].T
X_test.shape
OUTPUT
(15204, 256)
PYTHON
X_train.head()
OUTPUT
0123456789...263264265266267268269270271272
02.026.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.0117310.979016
12.023.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.085472.340000
22.023.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.0317769.240000
32.030.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.076209.960000
42.023.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.0302754.000000

5 rows x 256 columns

Bring the data into the same range. StandardScaler() standardizes features by removing the mean and scaling to unit variance.

PYTHON
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train
OUTPUT
array([[ 3.80478472e-02, -5.56029626e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02,  3.12133758e-03],
       [ 3.80478472e-02, -7.87181903e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02, -1.83006062e-01],
       [ 3.80478472e-02, -7.87181903e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02,  1.17499225e+00],
       ...,
       [ 3.80478472e-02,  5.99731758e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02, -2.41865113e-01],
       [ 3.80478472e-02, -1.70775831e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02,  3.12133758e-03],
       [ 3.80478472e-02,  2.91528722e-01,  7.65192053e+00, ...,
        -1.87046327e-02, -1.97720391e-02,  3.12133758e-03]])
PYTHON
X_train.shape, X_test.shape
OUTPUT
((60816, 256), (15204, 256))

The data is 2-dimensional, but neural networks accept 3-dimensional input, so reshape() is applied.

PYTHON
X_train = X_train.reshape(60816, 256,1)
X_test = X_test.reshape(15204, 256, 1)
X_train.shape, X_test.shape
OUTPUT
((60816, 256, 1), (15204, 256, 1))
PYTHON
y_train = y_train.to_numpy()
y_test = y_test.to_numpy()

Building the CNN

A Sequential() model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

Conv1D() is a 1D Convolution Layer, effective for deriving features from a fixed-length segment of the overall dataset, where the location of the feature in the segment is less important. In the first Conv1D() layer, the model learns 36 filters with a convolutional window size of 3. The input_shape specifies the shape of the input, required for the first layer in any neural network. The ReLU activation function outputs the input directly if positive, otherwise zero.

ReLU activation function graph showing zero output for negative inputs and linear output for positive values

BatchNormalization() allows each layer of a network to learn a little more independently of other layers. It normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation, keeping the mean output close to 0 and the standard deviation close to 1.

MaxPool1D() downsamples the input representation by taking the maximum value over the window defined by pool_size, which is 2 in the first Max Pool layer.

Dropout() randomly sets the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

Flatten() converts the data into a 1-dimensional array for inputting it to the next layer.

Dense() is the regular deeply connected neural network layer. The output layer has 1 neuron because a single value is predicted. The Sigmoid function is used because it outputs values between 0 and 1, which facilitates binary prediction.

PYTHON
model = Sequential()
model.add(Conv1D(32, 3, activation='relu', input_shape = (256,1)))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.3))

model.add(Conv1D(64, 3, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.5))

model.add(Conv1D(128, 3, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))
PYTHON
model.summary()
PYTHON
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv1d (Conv1D)              (None, 254, 32)           128
_________________________________________________________________
batch_normalization (BatchNo (None, 254, 32)           128
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 127, 32)           0
_________________________________________________________________
dropout (Dropout)            (None, 127, 32)           0
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 125, 64)           6208
_________________________________________________________________
batch_normalization_1 (Batch (None, 125, 64)           256
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 62, 64)            0
_________________________________________________________________
dropout_1 (Dropout)          (None, 62, 64)            0
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 60, 128)           24704
_________________________________________________________________
batch_normalization_2 (Batch (None, 60, 128)           512
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 30, 128)           0
_________________________________________________________________
dropout_2 (Dropout)          (None, 30, 128)           0
_________________________________________________________________
flatten (Flatten)            (None, 3840)              0
_________________________________________________________________
dense (Dense)                (None, 256)               983296
_________________________________________________________________
dropout_3 (Dropout)          (None, 256)               0
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 257
=================================================================
Total params: 1,015,489
Trainable params: 1,015,041
Non-trainable params: 448
_________________________________________________________________

Compiling and fitting the model uses an Adam optimizer with a 0.00005 learning rate. Training runs for 10 epochs. validation_data evaluates loss and metrics at the end of each epoch without training on that data. With metrics = ['accuracy'] the model is evaluated on accuracy.

PLAINTEXT
model.compile(optimizer=Adam(lr=0.00005), loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), verbose=1)
PLAINTEXT
Train on 60816 samples, validate on 15204 samples

Epoch 5/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1630 - accuracy: 0.9604 - val_loss: 0.1641 - val_accuracy: 0.9605
Epoch 6/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1599 - accuracy: 0.9603 - val_loss: 0.1595 - val_accuracy: 0.9605
Epoch 7/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1576 - accuracy: 0.9604 - val_loss: 0.1590 - val_accuracy: 0.9604
Epoch 8/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1556 - accuracy: 0.9604 - val_loss: 0.1610 - val_accuracy: 0.9605
Epoch 9/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1536 - accuracy: 0.9604 - val_loss: 0.1558 - val_accuracy: 0.9603
Epoch 10/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1542 - accuracy: 0.9604 - val_loss: 0.1602 - val_accuracy: 0.9599

history gives a summary of all the accuracies and losses calculated after each epoch.

PYTHON
history.history
OUTPUT
{'accuracy': [0.95417327, 0.9592706, 0.95992833, 0.96033937, 0.96037227, 0.9603065, 0.9604052, 0.960438, 0.9603887, 0.9604052], 'loss': [0.21693714527215763, 0.17656464240582592, 0.16882949567384484, 0.16588703954582057, 0.16303560407957227, 0.15994301885150822, 0.15763013028843298, 0.15563193596928912, 0.1535658989747522, 0.1542411554370529], 'val_accuracy': [0.9600763, 0.9600763, 0.96033937, 0.9604052, 0.9604709, 0.9604709, 0.9604052, 0.9604709, 0.9602736, 0.959879], 'val_loss': [0.17092196812710614, 0.1765108920851371, 0.16735200087523436, 0.1662461552617033, 0.16413307644895303, 0.1594827836499469, 0.15897791552088097, 0.16101698756464938, 0.15578439738331923, 0.16016060526129197]}

The charts below plot model accuracy and model loss: training accuracy vs validation accuracy, and training loss vs validation loss.

PYTHON
def plot_learningCurve(history, epoch):
  # Plot training & validation accuracy values
  epoch_range = range(1, epoch+1)
  plt.plot(epoch_range, history.history['accuracy'])
  plt.plot(epoch_range, history.history['val_accuracy'])
  plt.title('Model accuracy')
  plt.ylabel('Accuracy')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

  # Plot training & validation loss values
  plt.plot(epoch_range, history.history['loss'])
  plt.plot(epoch_range, history.history['val_loss'])
  plt.title('Model loss')
  plt.ylabel('Loss')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

plot_learningCurve(history, 10)

Line chart showing training and validation accuracy converging near 96% over 10 epochs

The loss plot confirms the same trend, with both curves falling steadily and no sign of divergence:

Line chart showing training and validation loss both decreasing from ~0.22 to ~0.16 over 10 epochs

The model reached 96% accuracy. Convolutional neural networks with appropriate feature selection can build an effective model for this dataset. Feature selection enables the machine learning algorithm to train faster, reduces model complexity, and can improve accuracy when the right subset is chosen.

Conclusion

In this tutorial you built a 1D CNN to predict bank customer satisfaction from 370 raw features. After removing constant, quasi-constant, and duplicate features, the dataset shrank to 256 informative columns. Trained on 60,816 samples for 10 epochs, the model achieved ~96% accuracy on the held-out test set, with training and validation curves tracking closely throughout.

Key takeaways:

  • Feature selection (variance thresholding and duplicate removal) cut 370 features to 256 without sacrificing predictive power. Smaller inputs mean faster training and less overfitting risk.
  • 1D CNNs can classify structured tabular data by treating each feature as a channel in a sequence; recurrent layers are not required for this task.
  • StandardScaler is essential before feeding tabular data to a CNN. Unnormalized large-magnitude features would dominate the convolutional filters.

Next steps:

Found this useful? Keep building with me.

New tutorials every week on YouTube: or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments