Google Stock Price Prediction using RNN – LSTM

Published by georgiannacambel on

Prediction of Google Stock Price using RNN

In this we are going to predict the opening price of the stock given the highest, lowest and closing price for that particular day by using RNN-LSTM.

Ref- https://colah.github.io/posts/2015-08-Understanding-LSTMs/

What is RNN?

  • Recurrent Neural Networks are the first of its kind State of the Art algorithms that can memorize/remember previous inputs in memory when a huge set of Sequential data is given to it.
  • Recurrent Neural Network is a generalization of feedforward neural network that has an internal memory.
  • RNN is recurrent in nature as it performs the same function for every input of data while the output of the current input depends on the past one computation.
  • After producing the output, it is copied and sent back into the recurrent network. For making a decision, it considers the current input and the output that it has learned from the previous input.
  • In other neural networks, all the inputs are independent of each other. But in RNN, all the inputs are related to each other.
image.png
  • These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network.
  • A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

Different types of RNNs

image.png

Some examples are-

  • One to one It is used for Image Classification. Here input is a single image and output is a single label of the category the image belongs.
  • One to many(Sequence output) It is used for Image Captioning. Here the input is an image and output is a group of words which is the caption for the image.
  • Many to One(Sequence input) It is used for Sentiment Analysis. Here a given sentence which is a group of words is classified as expressing positive or negative sentiment which is a single output.
  • Many to Many(Sequence input and sequence output) It is Machine Translation. A RNN reads a sentence in English and then outputs a sentence in French.
  • Synced Many to Many(Synced sequence input and output) It is used for Video Classification where we wish to label each frame of the video.

The Problem of Long-Term Dependencies

image.png
Vanishing Gradient
  • Information travels through the neural network from input neurons to the output neurons, while the error is calculated and propagated back through the network to update the weights.
  • During the training, the cost function(e) compares your outcomes to your desired output.
  • If the partial derivation of error is less than 1, then when it gets multiplied with the learning rate which is also very less won't generate a big change when compared with previous iteration.
  • For the vanishing gradient problem, the further you go through the network, the lower your gradient is and the harder it is to train the weights, which has a domino effect on all of the further weights throughout the network.
image.png
Exploding Gradient
  • We speak of Exploding Gradients when the algorithm assigns a stupidly high importance to the weights, without any reason.
  • Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training.
  • This has the effect of your model being unstable and unable to learn from your training data.
  • But fortunately, this problem can be easily solved if you truncate or squash the gradients
image.png

Long Short Term Memory (LSTM) Networks

  • Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies.
  • Generally LSTM is composed of a cell (the memory part of the LSTM unit) and three "regulators", usually called gates, of the flow of information inside the LSTM unit: an input gate, an output gate and a forget gate.
  • Intuitively, the cell is responsible for keeping track of the dependencies between the elements in the input sequence.
  • The input gate controls the extent to which a new value flows into the cell, the forget gate controls the extent to which a value remains in the cell and the output gate controls the extent to which the value in the cell is used to compute the output activation of the LSTM unit.
  • The activation function of the LSTM gates is often the logistic sigmoid function.
  • There are connections into and out of the LSTM gates, a few of which are recurrent. The weights of these connections, which need to be learned during training, determine how the gates operate.
  • LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!
lstm.png
image.png
image.png

Dataset

You can download the dataset from here

The data used in this notebook is from 19th August,2004 to 7th October,2019. The dataset consists of 7 columns which contain the date, opening price, highest price, lowest price, closing price, adjusted closing price and volume of share for each day.

Steps to build stock prediction model

  • Data Preprocessing
  • Building the RNN
  • Making the prediction and visualization
image.png

We will read the data for first 60 days and then predict for the 61st day. Then we will hop ahead bt one day and read the next chunk of data for next sixty days.

The necessary python libraries are imported here-

  • numpy is used to perform basic array operations
  • pyplot from matplotlib is used to visualize the results
  • pandas is used to read the dataset
  • MinMaxScaler from sklearn is used scale the data
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

read_csv is used to load the data into the dataframe. We can see the last 5 rows of the dataset using data.tail(). Similarly data.head() can be used to see the first 5 rows of the dataset. date_parser is used for converting a sequence of string columns to an array of datetime instances.

data = pd.read_csv('GOOG.csv', date_parser = True)
data.tail()
DateOpenHighLowCloseAdj CloseVolume
38042019-09-301220.9699711226.0000001212.3000491219.0000001219.0000001404100
38052019-10-011219.0000001231.2299801203.5799561205.0999761205.0999761273500
38062019-10-021196.9799801196.9799801171.2900391176.6300051176.6300051615100
38072019-10-031180.0000001189.0600591162.4300541187.8299561187.8299561621200
38082019-10-041191.8900151211.4399411189.1700441209.0000001209.0000001021092

Here we splitting the data into training and testing dataset. We are going to take data from 2004 to 2018 as training data. Subsequently we are going to take the data of 2019 as testing data.

data_training = data[data['Date']<'2019-01-01'].copy()
data_test = data[data['Date']>='2019-01-01'].copy()

We are dropping the columns Date and Adj Close from the training dataset

data_training = data_training.drop(['Date', 'Adj Close'], axis = 1)

The values in the training data are not in the same range. For getting all the values in between the range 0 to 1 we are going to use MinMaxScalar().This improves the accuracy of prediction.

scaler = MinMaxScaler()
data_training = scaler.fit_transform(data_training)
data_training
array([[3.30294890e-04, 9.44785459e-04, 0.00000000e+00, 1.34908021e-04,
        5.43577404e-01],
       [7.42148227e-04, 2.98909923e-03, 1.88269054e-03, 3.39307537e-03,
        2.77885613e-01],
       [4.71386886e-03, 4.78092896e-03, 5.42828241e-03, 3.83867225e-03,
        2.22150736e-01],
       ...,
       [7.92197108e-01, 8.11970141e-01, 7.90196475e-01, 8.15799920e-01,
        2.54672037e-02],
       [8.18777193e-01, 8.21510648e-01, 8.20249255e-01, 8.10219301e-01,
        1.70463908e-02],
       [8.19874096e-01, 8.19172449e-01, 8.12332341e-01, 8.09012935e-01,
        1.79975186e-02]])

As mentioned above we are going to train the model on data of 60 days at a time. So the code mentioned below divides the data into chunks of 60 rows. data_training.shape[0] is equal to 3617 which corresponds to the length of data_traning. After dividing we are converting X_train and y_train into numpy arrays.

X_train = []
y_train = []

for i in range(60, data_training.shape[0]):
    X_train.append(data_training[i-60:i])
    y_train.append(data_training[i, 0])
    
X_train, y_train = np.array(X_train), np.array(y_train)

As we can see X_train now consists of 3557 chunks of data having 60 lists each and each list has 5 elements which correspond to the 5 attributes in the dataset.

X_train.shape
(3557, 60, 5)

Building LSTM

Here we are importing the necessary layers to build out neural network

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
  • The first layer is the LSTM layer with 60 units.
  • We will be using relu activation function.
  • The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero.relu.png
  • return_sequence when set to True returns the full sequence as the output.
  • input_shape is set to (X_train.shape[1],5) which is (60,5)
  • Dropout layer is used to by randomly set the outgoing edges of hidden units to 0 at each update of the training phase.
  • The value passed in dropout specifies the probability at which outputs of the layer are dropped out.
  • The last layer is the Dense layer is the regular deeply connected neural network layer.
  • As we are predicting a single value the units in the last layer is set to 1.
regressor = Sequential()

regressor.add(LSTM(units = 60, activation = 'relu', return_sequences = True, input_shape = (X_train.shape[1], 5)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 60, activation = 'relu', return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 80, activation = 'relu', return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 120, activation = 'relu'))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))
regressor.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 60, 60)            15840     
_________________________________________________________________
dropout (Dropout)            (None, 60, 60)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 60, 60)            29040     
_________________________________________________________________
dropout_1 (Dropout)          (None, 60, 60)            0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 60, 80)            45120     
_________________________________________________________________
dropout_2 (Dropout)          (None, 60, 80)            0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 120)               96480     
_________________________________________________________________
dropout_3 (Dropout)          (None, 120)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 121       
=================================================================
Total params: 186,601
Trainable params: 186,601
Non-trainable params: 0
_________________________________________________________________

Here we are compiling the model and fitting it to the training data. We will use 50 epochs to train the model. An epoch is an iteration over the entire data provided. batch_size is the number of samples per gradient update i.e. here the weights will be updates after 32 training examples.

regressor.compile(optimizer='adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs=50, batch_size=32)
Train on 3557 samples

Epoch 45/50
3557/3557 [==============================] - 26s 7ms/sample - loss: 6.8088e-04
Epoch 46/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.0968e-04
Epoch 47/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.6604e-04
Epoch 48/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.2150e-04
Epoch 49/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.4292e-04
Epoch 50/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.3066e-04

Prepare test dataset

These are the first 5 entries in the test data set. To predict opening on any day we need the data of previous 60 days.

data_test.head()
DateOpenHighLowCloseAdj CloseVolume
36172019-01-021016.5700071052.3199461015.7100221045.8499761045.8499761532600
36182019-01-031041.0000001056.9799801014.0700071016.0599981016.0599981841100
36192019-01-041032.5899661070.8399661027.4179691070.7099611070.7099612093900
36202019-01-071071.5000001074.0000001054.7600101068.3900151068.3900151981900
36212019-01-081076.1099851084.5600591060.5300291076.2800291076.2800291764900

past_60_days contains the data of the past 60 days required to predict the opening of the 1st day in the test data set.

past_60_days = data_training.tail(60)

We are going to append data_test to past_60_days and ignore the index of data_test and drop Date and Adj Close.

df = past_60_days.append(data_test, ignore_index = True)
df = df.drop(['Date', 'Adj Close'], axis = 1)
df.head()
OpenHighLowCloseVolume
01195.3299561197.5100101155.5760501168.1899412209500
11167.5000001173.5000001145.1199951157.3499761184300
21150.1099851168.0000001127.3640141148.9699711932400
31146.1500241154.3499761137.5720211138.8199461308700
41131.0799561132.1700441081.1300051081.2199712675700

Similar to the training data set we have to scale the test data so that all the values are in the range 0 to 1.

inputs = scaler.transform(df)
inputs
array([[0.93805611, 0.93755773, 0.92220906, 0.91781776, 0.0266752 ],
       [0.91527437, 0.91792904, 0.91350452, 0.90892169, 0.01425359],
       [0.90103881, 0.91343268, 0.89872289, 0.90204445, 0.02331778],
       ...,
       [0.93940683, 0.93712442, 0.93529076, 0.9247443 , 0.01947328],
       [0.92550693, 0.93064972, 0.92791493, 0.9339358 , 0.01954719],
       [0.93524016, 0.94894575, 0.95017564, 0.95130949, 0.01227612]])

We have to prepare the test data like the training data.

X_test = []
y_test = []

for i in range(60, inputs.shape[0]):
    X_test.append(inputs[i-60:i])
    y_test.append(inputs[i, 0])

X_test, y_test = np.array(X_test), np.array(y_test)
X_test.shape, y_test.shape
((192, 60, 5), (192,))

We are now going to predict the opening for X_test using predict()

y_pred = regressor.predict(X_test)

As we had scaled all the values down, now we will have to get them back to the original scale. scaler.scale_ gives the scaling level

scaler.scale_
array([8.18605127e-04, 8.17521128e-04, 8.32487534e-04, 8.20673293e-04,
       1.21162775e-08])

8.186 is the first value in the list which gives the scale of opening price. We will multiply y_pred and y_test with the inverse of this to get all the values to the original scale.

scale = 1/8.18605127e-04
scale
1221.5901990069017
y_pred = y_pred*scale
y_test = y_test*scale

Visualization

# Visualising the results
plt.figure(figsize=(14,5))
plt.plot(y_test, color = 'red', label = 'Real Google Stock Price')
plt.plot(y_pred, color = 'blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()

As we can see from the graph we have got a decent prediction of the opening price with a good accuracy.