Deep Learning with TensorFlow 2.0 Tutorial – Building Your First ANN with TensorFlow 2.0

Published by ignacioberrios on

Deep learning with Tensorflow

# pip install tensorflow==2.0.0-rc0
# pip install tensorflow-gpu==2.0.0-rc0

Watch Full Lesson Here:

Objective

  • Our objective for this code is to build to an Artificial neural network for classification problem using tensorflow and keras libraries. We will try to learn how to build a nerual netwroks model using tensorflow and keras then we will analyse our model using different accuracy metrics.

What is ANN?

Artificial Neural Networks (ANN) is a supervised learning system built of a large number of simple elements, called neurons or perceptrons. Each neuron can make simple decisions, and feeds those decisions to other neurons, organized in interconnected layers.

neuronvsbio.png
shallow deep.png

What is Activation Function?

  • In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or "OFF" (0), depending on input. This is similar to the behavior of the linear perceptron in neural networks.
  • If we do not apply a Activation function then the output signal would simply be a simple linear function.A linear function is just a polynomial of one degree.

Types of Activation Function

  • Sigmoid
  • Tanh
  • ReLu
  • LeakyReLu
  • SoftMax

Sigmoid

1_Xu7B5y9gp0iL5ooBj7LtWw.png
bildschirmfoto-2017-11-10-um-12-20-57.png
1_oePAhrm74RNnNEolprmTaQ.png
leaky.jpg

Softmax Funcation

softmax.png

What is Back Propagation?

  • In backpropagation we update the parameters of the model with respect to loss function. Loss function can be cross entropy for classification problem and root mean squared error for regression problems.
  • Our objective is to minimize loss of our model. So to minimize loss of our model we caluculate gradeint of loss with respect to paramters of model and try to minimize the this gradient. while minimizing the gradient we update the weights of our model this process is known as back propagation.
BACKPROP.png

Steps for building your first ANN

  • Data Preprocessing
  • Add input layer
  • Random w init
  • Add Hidden Layers
  • Select Optimizer, Loss, and Performance Metrics
  • Compile the model
  • use model.fit to train the model
  • Evaluate the model
  • Adjust optimization parameters or model if needed

Data Preprocessing

  • It is better to preprocess data before giving it to any neural net model. Data should be normally distributed (gaussian distribution), so that model performs well.
  • If our data is not normally distributed that means there is skewness in data. To remove skewness of data we can take logarithm of data . by using log function we can remove skewness of data.
  • After removing skewness of data it is better to scale of data so that all values are at same scale.
  • We can either use MinMax scaler or Standardscaler.
  • Standardscalers are better to use since by using it mean and variance of our data is now 0 and 1 respectively . That is now our data is in form of N(0,1) that is gaussian distribution with mean 0 and variance 1.

Layers

Adding input layer

  • according to size of our input we add number of input layers.

Adding hidden layers

  • We can add as many hidden layers. if we want our model to be complex than large number of hidden layers can be added and for simple model number of hidden layes can be small

Adding output layer

  • In a classification problem size of output layer depend on number of classes.
  • In regression problem there is size of output layer is one

Weight initialization

  • The mean of the weights should be zero.
  • The variance of the weights should stay the same across every layer.

Optimizers

Gradient Descent

  • Gradient descent is a first-order optimization algorithm which is dependent on the first order derivative of a loss function. It calculates that which way the weights should be altered so that the function can reach a minima. Through backpropagation, the loss is transferred from one layer to another and the model’s parameters also known as weights are modified depending on the losses so that the loss can be minimized.

Stochastic Gradient Descent

  • It’s a variant of Gradient Descent. It tries to update the model’s parameters more frequently. In this, the model parameters are altered after computation of loss on each training example. So, if the dataset contains 1000 rows SGD will update the model parameters 1000 times in one cycle of dataset instead of one time as in Gradient Descent.

Mini-Batch Gradient Descent

  • It’s best among all the variations of gradient descent algorithms. It is an improvement on both SGD and standard gradient descent. It updates the model parameters after every batch. So, the dataset is divided into various batches and after every batch, the parameters are updated.

Adagrad

  • It is gradient descent with adaptive learning rate
  • in this the learning rate decays for parameters in proportion to their update history(more updates means more decay)

losses

  • Cross entropy for Classification problems.
  • Root mean squared error for regression problems.

Accuracy metrics

\begin{align*}
Accuracy  = \frac{TP + TN }{TP+FP+TN+FN}
\end{align*}
\begin{align*}
Recall  = \frac{TP}{TP+FN}
\end{align*}
\begin{align*}
Precision  = \frac{TP}{TP+FP}
\end{align*}

Installing libraries

# pip install tensorflow==2.0.0-rc0
# pip install tensorflow-gpu==2.0.0-rc0
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense
print(tf.__version__)
2.2.0

Importing necessary libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
dataset = pd.read_csv('Customer_Churn_Modelling.csv')
dataset.head()
RowNumberCustomerIdSurnameCreditScoreGeographyGenderAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalaryExited
0115634602Hargrave619FranceFemale4220.00111101348.881
1215647311Hill608SpainFemale41183807.86101112542.580
2315619304Onio502FranceFemale428159660.80310113931.571
3415701354Boni699FranceFemale3910.0020093826.630
4515737888Mitchell850SpainFemale432125510.8211179084.100
X = dataset.drop(labels=['CustomerId', 'Surname', 'RowNumber', 'Exited'], axis = 1)
y = dataset['Exited']
X.head()
CreditScoreGeographyGenderAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalary
0619FranceFemale4220.00111101348.88
1608SpainFemale41183807.86101112542.58
2502FranceFemale428159660.80310113931.57
3699FranceFemale3910.0020093826.63
4850SpainFemale432125510.8211179084.10
y.head()
0    1
1    0
2    1
3    0
4    0
Name: Exited, dtype: int64

Using label encoder we are converting categorical features to numerical features

from sklearn.preprocessing import LabelEncoder
label1 = LabelEncoder()
X['Geography'] = label1.fit_transform(X['Geography'])
label = LabelEncoder()
X['Gender'] = label.fit_transform(X['Gender'])
X.head()
CreditScoreGeographyGenderAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalary
0619004220.00111101348.88
16082041183807.86101112542.58
250200428159660.80310113931.57
3699003910.0020093826.63
485020432125510.8211179084.10
X = pd.get_dummies(X, drop_first=True, columns=['Geography'])
X.head()
CreditScoreGenderAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalaryGeography_1Geography_2
061904220.00111101348.8800
1608041183807.86101112542.5801
25020428159660.80310113931.5700
369903910.0020093826.6300
48500432125510.8211179084.1001
  • Here using standardscaler we are scaling our data, we are scaling such that the mean is 0 and variance is 1 for data
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, stratify = y)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train
array([[-1.24021723, -1.09665089,  0.77986083, ...,  1.64099027,
        -0.57812007, -0.57504086],
       [ 0.75974873,  0.91186722, -0.27382717, ..., -1.55587522,
         1.72974448, -0.57504086],
       [-1.72725557, -1.09665089, -0.9443559 , ...,  1.1038111 ,
        -0.57812007, -0.57504086],
       ...,
       [-0.51484098,  0.91186722,  0.87565065, ..., -1.01507508,
         1.72974448, -0.57504086],
       [ 0.73902369, -1.09665089, -0.36961699, ..., -1.47887193,
        -0.57812007, -0.57504086],
       [ 0.95663657,  0.91186722, -1.32751517, ...,  0.50945854,
        -0.57812007,  1.73900686]])

Build ANN

  • Here bwe are building ANN model.
  • First we add input layer of shape of input that is 11 in this case.
  • There is only one hidden layers whose shape is 128.
  • Shape of Output layer is only 1 since we have only one output.
model = Sequential()
model.add(Dense(X.shape[1], activation='relu', input_dim = X.shape[1]))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation = 'sigmoid'))
X.shape[1]
11
  • Here we are compiling our model. we have selected Adam optimizer. loss is binary crossentropy and metric is accuracy
model.compile(optimizer='adam', loss = 'binary_crossentropy', metrics=['accuracy'])
  • Here we are fitting model on training dataset . we have given bacth size of 10 and eopchs are 10
model.fit(X_train, y_train.to_numpy(), batch_size = 10, epochs = 10, verbose = 1)
Epoch 1/10
800/800 [==============================] - 1s 2ms/step - loss: 0.4516 - accuracy: 0.8116
Epoch 2/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3948 - accuracy: 0.8372
Epoch 3/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3597 - accuracy: 0.8543
Epoch 4/10
800/800 [==============================] - 1s 2ms/step - loss: 0.3475 - accuracy: 0.8576
Epoch 5/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3426 - accuracy: 0.8611
Epoch 6/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3389 - accuracy: 0.8619
Epoch 7/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3366 - accuracy: 0.8625
Epoch 8/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3350 - accuracy: 0.8629
Epoch 9/10
800/800 [==============================] - 1s 2ms/step - loss: 0.3333 - accuracy: 0.8635
Epoch 10/10
800/800 [==============================] - 1s 1ms/step - loss: 0.3311 - accuracy: 0.8634
<tensorflow.python.keras.callbacks.History at 0x271d1a03580>
  • Using model.predict we predict output values for our input data.
y_pred = model.predict_classes(X_test)
y_pred
array([[0],
       [0],
       [0],
       ...,
       [0],
       [1],
       [0]])
y_test
1344    1
8167    0
4747    0
5004    1
3124    1
       ..
9107    0
8249    0
8337    0
6279    1
412     0
Name: Exited, Length: 2000, dtype: int64
model.evaluate(X_test, y_test.to_numpy())
63/63 [==============================] - 0s 2ms/step - loss: 0.3489 - accuracy: 0.8520
[0.34891313314437866, 0.8519999980926514]
from sklearn.metrics import confusion_matrix, accuracy_score

Confusion matrix

confusion_matrix(y_test, y_pred)
array([[1546,   47],
       [ 249,  158]], dtype=int64)
accuracy_score(y_test, y_pred)
0.852

Summary

  • In this notebook we have implemented a classifer using artificial neural network. We build the model using tensorflow and keras. We checked the accuracy using Accuracy metrics and Confusion metrix. Accuracy for the model was 85.2% on test data.