#aarya tadvalkar#bank customer satisfaction#cnn#Deep Learning#Feature selection#kgptalkie#Machine Learning#Tensorflow

Customer Satisfaction Prediction with CNN

Predict bank customer satisfaction using a 1D CNN in TensorFlow. Covers feature selection, StandardScaler, Conv1D layers, and binary classification training.

May 16, 2026 at 9:45 AM11 min readFollowFollow (Hindi)

Topics You Will Master

Exploratory data analysis and feature importance ranking
StandardScaler normalization for tabular feature inputs
Conv1D layer design for 1-D structured data classification
Binary cross-entropy loss and Adam optimizer configuration
Evaluating precision, recall, and F1-score on imbalanced data
Best For

Data scientists applying CNNs to tabular classification problems.

Expected Outcome

A 1D CNN model predicting bank customer satisfaction with high accuracy.

Feature Selection and CNN

In this project we are going to build a neural network to predict if a particular bank customer is satisfies or not. To do this we are going to use Convolutional Neural Networks. The dataset which we are going to use contains 370 features. Install TensorFlow with pip install tensorflow (or pip install tensorflow-gpu for GPU).

PYTHON
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import VarianceThreshold

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv1D, MaxPool1D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
print(tf.__version__)
OUTPUT
2.1.0

You can use this command to directly get the data from github.

PLAINTEXT
!git clone https://github.com/laxmimerit/Data-Files-for-Feature-Selection.git

After downloading the data, we will now read the data using read_csv(). To see the first 5 rows of the data we can use data.head().

PYTHON
data = pd.read_csv('train.csv')
data.head()
OUTPUT
IDvar3var15imp_ent_var16_ult1imp_op_var39_comer_ult1imp_op_var39_comer_ult3imp_op_var40_comer_ult1imp_op_var40_comer_ult3imp_op_var40_efect_ult1imp_op_var40_efect_ult3...saldo_medio_var33_hace2saldo_medio_var33_hace3saldo_medio_var33_ult1saldo_medio_var33_ult3saldo_medio_var44_hace2saldo_medio_var44_hace3saldo_medio_var44_ult1saldo_medio_var44_ult3var38TARGET
012230.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.039205.1700000
132340.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.049278.0300000
242230.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.067333.7700000
382370.0195.0195.00.00.00.00.0...0.00.00.00.00.00.00.00.064007.9700000
4102390.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.0117310.9790160

5 rows × 371 columns

We have 76020 rows in the dataset and 371 columns.

PYTHON
data.shape
OUTPUT
(76020, 371)

Now we are going to create a feature space X. Feature space will only contain the column which provide information necessary for prediction. ID and TARGET do not play any role in prediction, so we are going to remove them using drop(). After droppring the 2 columns you can see that the number of columns have reduced to 369.

PYTHON
X = data.drop(labels=['ID', 'TARGET'], axis = 1)
X.shape
OUTPUT
(76020, 369)

Lets create a variable y which contains the values which have to be predicted i.e. TARGET.

PYTHON
y = data['TARGET']

Now we will split the data into training and testing set with the help of train_test_split()test_size = 0.2 will keep 20% data for testing and 80% data will be used for training the model. random_state controls the shuffling applied to the data before applying the split. stratify = y means that the data is split in a stratified fashion, using y as the class labels.

PYTHON
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 0, stratify = y)

As we can see, the training dataset consists of 60816 rows i.e. 80% of the data and the testing dataset consists of 15204 rows i.e 20% of the data.

PYTHON
X_train.shape, X_test.shape
OUTPUT
((60816, 369), (15204, 369))

Remove Constant, Quasi Constant and Duplicate Features

Feature selection is the process of reducing the number of input variables when developing a predictive model.

  • Constant Features are the features that show single values in all the observations in the dataset. These features provide no information that allows ML models to predict the target.
  • Quasi constant features, as the name suggests, are the features that are almost constant. In other words, these features have the same values for a very large subset of the outputs. They have less variance. Such features are not very useful for making predictions.
  • Duplicate Features as the name suggests are duplicated in the dataset.

Here we have set the variance threshold to 1% i.e. if any column has variance less than 1% it will be removed. In other words only the columns having variance greater than 99% will be retained. We are fitting VarianceThreshold() to the training data and not the test data. We are only transforming the test data.

PYTHON
filter = VarianceThreshold(0.01)
X_train = filter.fit_transform(X_train)
X_test = filter.transform(X_test)

X_train.shape, X_test.shape
OUTPUT
((60816, 273), (15204, 273))

After removing the Quasi constant features we can see that 96 features are removed from the dataset.

PLAINTEXT
369-273
PLAINTEXT
96

Now we will remove the duplicate features. We don't have any direct function to remove duplicate features but we have functions to check for duplicate rows. Hence we are taking transpose of the data set using .T. As we can see after taking transpose the shape of X_train_T is exactly opposite to that of X_train.

PYTHON
X_train_T = X_train.T
X_test_T = X_test.T

X_train_T = pd.DataFrame(X_train_T)
X_test_T = pd.DataFrame(X_test_T)

X_train_T.shape
OUTPUT
(273, 60816)

.duplicated() returns a boolean Series denoting duplicate rows. We can see that 17 features are duplicated.

PYTHON
X_train_T.duplicated().sum()
OUTPUT
17

Now we will see the list of dupicated features. The features having index True are duplicated.

PYTHON
duplicated_features = X_train_T.duplicated()
duplicated_features[70:90]
OUTPUT
70    False
71    False
72     True
73    False
74     True
75    False
76    False
77    False
78    False
79    False
80    False
81    False
82    False
83    False
84    False
85    False
86    False
87    False
88    False
89    False
dtype: bool

We have to retain the features with False value because they are not duplicated. So here we are going to use inversion i.e. we are going to change the False value to True and viceversa.

PYTHON
features_to_keep = [not index for index in duplicated_features]
features_to_keep[70:90]
OUTPUT
[True, True, False, True, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]

Now as we have inverted the values, we have to retain the features with value True. We also have to take transpose once again to get the data back in original shape. Here we have done it for X_train.

PYTHON
X_train = X_train_T[features_to_keep].T
X_train.shape
OUTPUT
(60816, 256)

Here we have done it for X_test.

PYTHON
X_test = X_test_T[features_to_keep].T
X_test.shape
OUTPUT
(15204, 256)
PYTHON
X_train.head()
OUTPUT
0123456789...263264265266267268269270271272
02.026.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.0117310.979016
12.023.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.085472.340000
22.023.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.0317769.240000
32.030.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.076209.960000
42.023.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.0302754.000000

5 rows × 256 columns

Now we are going to get the bring the data into the same range. StandardScaler() standardizes the features by removing the mean and scaling to unit variance.

PYTHON
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train
OUTPUT
array([[ 3.80478472e-02, -5.56029626e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02,  3.12133758e-03],
       [ 3.80478472e-02, -7.87181903e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02, -1.83006062e-01],
       [ 3.80478472e-02, -7.87181903e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02,  1.17499225e+00],
       ...,
       [ 3.80478472e-02,  5.99731758e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02, -2.41865113e-01],
       [ 3.80478472e-02, -1.70775831e-01, -5.27331414e-02, ...,
        -1.87046327e-02, -1.97720391e-02,  3.12133758e-03],
       [ 3.80478472e-02,  2.91528722e-01,  7.65192053e+00, ...,
        -1.87046327e-02, -1.97720391e-02,  3.12133758e-03]])
PYTHON
X_train.shape, X_test.shape
OUTPUT
((60816, 256), (15204, 256))

Our data is 2 dimensional but neural networks accept 3 dimensional data. So we have to reshape() the data.

PYTHON
X_train = X_train.reshape(60816, 256,1)
X_test = X_test.reshape(15204, 256, 1)
X_train.shape, X_test.shape
OUTPUT
((60816, 256, 1), (15204, 256, 1))
PYTHON
y_train = y_train.to_numpy()
y_test = y_test.to_numpy()

Building the CNN

Sequential() model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

Conv1D() is a 1D Convolution Layer, this layer is very effective for deriving features from a fixed-length segment of the overall dataset, where it is not so important where the feature is located in the segment. In the first Conv1D() layer, we are learning a total of 36 filters with size of the convolutional window as 3. The input_shape specifies the shape of the input. It is a necessary parameter for the first layer in any neural network. We will be using the ReLu activation function. The rectified linear activation function or ReLU  for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero.

ReLU activation function graph showing zero output for negative inputs and linear output for positive values

BatchNormalization() allows each layer of a network to learn by itself a little bit more independently of other layers. To increase the stability of a neural network, batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. It applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

MaxPool1D() downsamples the input representation by taking the maximum value over the window defined by pool_size which is 2 in case of the first Max Pool layer of this neural network.

Dropout() is used to randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

Flatten() is used to convert the data into a 1-dimensional array for inputting it to the next layer.

Dense() is the regular deeply connected neural network layer. The output layer is a dense layer with 1 neuron because we are predicting a single value. Sigmoid function is used because it exists between (0 to 1) and this facilitates us to predict a binary input.

PYTHON
model = Sequential()
model.add(Conv1D(32, 3, activation='relu', input_shape = (256,1)))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.3))

model.add(Conv1D(64, 3, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.5))

model.add(Conv1D(128, 3, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))
PYTHON
model.summary()
PYTHON
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv1d (Conv1D)              (None, 254, 32)           128
_________________________________________________________________
batch_normalization (BatchNo (None, 254, 32)           128
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 127, 32)           0
_________________________________________________________________
dropout (Dropout)            (None, 127, 32)           0
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 125, 64)           6208
_________________________________________________________________
batch_normalization_1 (Batch (None, 125, 64)           256
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 62, 64)            0
_________________________________________________________________
dropout_1 (Dropout)          (None, 62, 64)            0
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 60, 128)           24704
_________________________________________________________________
batch_normalization_2 (Batch (None, 60, 128)           512
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 30, 128)           0
_________________________________________________________________
dropout_2 (Dropout)          (None, 30, 128)           0
_________________________________________________________________
flatten (Flatten)            (None, 3840)              0
_________________________________________________________________
dense (Dense)                (None, 256)               983296
_________________________________________________________________
dropout_3 (Dropout)          (None, 256)               0
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 257
=================================================================
Total params: 1,015,489
Trainable params: 1,015,041
Non-trainable params: 448
_________________________________________________________________

Now we will compile and fit the model. We are using Adam optimizer with 0.00005 learning rate. We will use 10 epochs to train the model. An epoch is an iteration over the entire data provided. validation_data is the data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. As metrics = ['accuracy'] the model will be evaluated based on the accuracy.

PLAINTEXT
model.compile(optimizer=Adam(lr=0.00005), loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), verbose=1)
PLAINTEXT
Train on 60816 samples, validate on 15204 samples

Epoch 5/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1630 - accuracy: 0.9604 - val_loss: 0.1641 - val_accuracy: 0.9605
Epoch 6/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1599 - accuracy: 0.9603 - val_loss: 0.1595 - val_accuracy: 0.9605
Epoch 7/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1576 - accuracy: 0.9604 - val_loss: 0.1590 - val_accuracy: 0.9604
Epoch 8/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1556 - accuracy: 0.9604 - val_loss: 0.1610 - val_accuracy: 0.9605
Epoch 9/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1536 - accuracy: 0.9604 - val_loss: 0.1558 - val_accuracy: 0.9603
Epoch 10/10
60816/60816 [==============================] - 111s 2ms/sample - loss: 0.1542 - accuracy: 0.9604 - val_loss: 0.1602 - val_accuracy: 0.9599

history gives us the summary of all the accuracies and losses calculated after each epoch.

PYTHON
history.history
OUTPUT
{'accuracy': [0.95417327, 0.9592706, 0.95992833, 0.96033937, 0.96037227, 0.9603065, 0.9604052, 0.960438, 0.9603887, 0.9604052], 'loss': [0.21693714527215763, 0.17656464240582592, 0.16882949567384484, 0.16588703954582057, 0.16303560407957227, 0.15994301885150822, 0.15763013028843298, 0.15563193596928912, 0.1535658989747522, 0.1542411554370529], 'val_accuracy': [0.9600763, 0.9600763, 0.96033937, 0.9604052, 0.9604709, 0.9604709, 0.9604052, 0.9604709, 0.9602736, 0.959879], 'val_loss': [0.17092196812710614, 0.1765108920851371, 0.16735200087523436, 0.1662461552617033, 0.16413307644895303, 0.1594827836499469, 0.15897791552088097, 0.16101698756464938, 0.15578439738331923, 0.16016060526129197]}

We will now plot the model accuracy and model loss. In model accuracy we will plot the training accuracy and validation accuracy and in model loss we will plot the training loss and validation loss.

PYTHON
def plot_learningCurve(history, epoch):
  # Plot training & validation accuracy values
  epoch_range = range(1, epoch+1)
  plt.plot(epoch_range, history.history['accuracy'])
  plt.plot(epoch_range, history.history['val_accuracy'])
  plt.title('Model accuracy')
  plt.ylabel('Accuracy')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

  # Plot training & validation loss values
  plt.plot(epoch_range, history.history['loss'])
  plt.plot(epoch_range, history.history['val_loss'])
  plt.title('Model loss')
  plt.ylabel('Loss')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

plot_learningCurve(history, 10)

Line chart showing training and validation accuracy converging near 96% over 10 epochs

The loss plot confirms the same trend — both curves fall steadily with no sign of divergence:

Line chart showing training and validation loss both decreasing from ~0.22 to ~0.16 over 10 epochs

We have got an accuracy of 96%. Hence we can conclude that Convolutional neural networks with appropriate feature selection can build a very powerful model for this dataset. Feature selection enables the machine learning algorithm to train faster. It reduces the complexity of a model and makes it easier to interpret. It also improves the accuracy of a model if the right subset is chosen.

Conclusion

In this tutorial you built a 1D CNN to predict bank customer satisfaction from 370 raw features. After removing constant, quasi-constant, and duplicate features, the dataset shrank to 256 informative columns. Trained on 60,816 samples for 10 epochs, the model achieved ~96% accuracy on the held-out test set, with training and validation curves tracking closely throughout.

Key takeaways:

  • Feature selection (variance thresholding and duplicate removal) cut 370 features to 256 without sacrificing predictive power — smaller inputs mean faster training and less overfitting risk.
  • 1D CNNs can classify structured tabular data by treating each feature as a channel in a sequence; you do not need recurrent layers for this task.
  • StandardScaler is essential before feeding tabular data to a CNN — unnormalized large-magnitude features would dominate the convolutional filters.

Next steps:

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments