#aarya#Deep Learning#google stock price#kgptalkie#lstm#Machine Learning#python#rnn

Google Stock Price Prediction using RNN - LSTM

Predict Google stock prices using a stacked LSTM in TensorFlow. Covers RNN concepts, MinMaxScaler, data windowing, LSTM layers, and time-series visualization.

May 22, 2026 at 8:15 PM8 min readFollowFollow (Hindi)

Topics You Will Master

RNN fundamentals: vanishing gradients and LSTM gating mechanisms
MinMaxScaler normalization and 60-day look-back window construction
Stacked LSTM architecture with Dropout between layers
Training on historical close prices and predicting future trends
Matplotlib visualization of predicted vs. actual stock prices
Best For

Developers learning LSTM for financial time-series forecasting.

Expected Outcome

A stacked LSTM model that visualizes and predicts Google stock price trends.

Prediction of Google Stock Price using RNN

Stacked LSTM networks capture long-range temporal patterns across sequences, handling vanishing gradients through learned input, forget, and output gates. This tutorial builds a multi-layer LSTM in TensorFlow using a 60-day look-back window to predict Google stock opening prices.

Dataset

You can download the dataset from here

The data used in this notebook is from 19th August,2004 to 7th October,2019. The dataset consists of 7 columns which contain the date, opening price, highest price, lowest price, closing price, adjusted closing price and volume of share for each day.

Steps to build stock prediction model

  • Data Preprocessing
  • Building the RNN
  • Making the prediction and visualization

Diagram showing a 60-day sliding window over Google Stock Prices timeline used to construct training sequences

We will read the data for first 60 days and then predict for the 61st day. Then we will hop ahead bt one day and read the next chunk of data for next sixty days.

PYTHON
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

read_csv is used to load the data into the dataframe. We can see the last 5 rows of the dataset using data.tail(). Similarly data.head() can be used to see the first 5 rows of the dataset. date_parser is used for converting a sequence of string columns to an array of datetime instances.

PYTHON
data = pd.read_csv('GOOG.csv', date_parser = True)
data.tail()
OUTPUT
DateOpenHighLowCloseAdj CloseVolume
38042019-09-301220.9699711226.0000001212.3000491219.0000001219.0000001404100
38052019-10-011219.0000001231.2299801203.5799561205.0999761205.0999761273500
38062019-10-021196.9799801196.9799801171.2900391176.6300051176.6300051615100
38072019-10-031180.0000001189.0600591162.4300541187.8299561187.8299561621200
38082019-10-041191.8900151211.4399411189.1700441209.0000001209.0000001021092

Here we splitting the data into training and testing dataset. We are going to take data from 2004 to 2018 as training data. Subsequently we are going to take the data of 2019 as testing data.

PYTHON
data_training = data[data['Date']='2019-01-01'].copy()

We are dropping the columns Date and Adj Close from the training dataset

PYTHON
data_training = data_training.drop(['Date', 'Adj Close'], axis = 1)

The values in the training data are not in the same range. For getting all the values in between the range 0 to 1 we are going to use MinMaxScalar().This improves the accuracy of prediction.

PYTHON
scaler = MinMaxScaler()
data_training = scaler.fit_transform(data_training)
data_training
OUTPUT
array([[3.30294890e-04, 9.44785459e-04, 0.00000000e+00, 1.34908021e-04,
        5.43577404e-01],
       [7.42148227e-04, 2.98909923e-03, 1.88269054e-03, 3.39307537e-03,
        2.77885613e-01],
       [4.71386886e-03, 4.78092896e-03, 5.42828241e-03, 3.83867225e-03,
        2.22150736e-01],
       ...,
       [7.92197108e-01, 8.11970141e-01, 7.90196475e-01, 8.15799920e-01,
        2.54672037e-02],
       [8.18777193e-01, 8.21510648e-01, 8.20249255e-01, 8.10219301e-01,
        1.70463908e-02],
       [8.19874096e-01, 8.19172449e-01, 8.12332341e-01, 8.09012935e-01,
        1.79975186e-02]])

As mentioned above we are going to train the model on data of 60 days at a time. So the code mentioned below divides the data into chunks of 60 rows. data_training.shape[0] is equal to 3617 which corresponds to the length of data_traning. After dividing we are converting X_train and y_train into numpy arrays.

PYTHON
X_train = []
y_train = []

for i in range(60, data_training.shape[0]):
    X_train.append(data_training[i-60:i])
    y_train.append(data_training[i, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

As we can see X_train now consists of 3557 chunks of data having 60 lists each and each list has 5 elements which correspond to the 5 attributes in the dataset.

PYTHON
X_train.shape
OUTPUT
(3557, 60, 5)

Building LSTM

Here we are importing the necessary layers to build out neural network

PYTHON
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
  • The first layer is the LSTM layer with 60 units.

  • We will be using relu activation function.

  • The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero.

  • return_sequence when set to True returns the full sequence as the output.

  • input_shape is set to (X_train.shape[1],5) which is (60,5)

  • Dropout layer is used to by randomly set the outgoing edges of hidden units to 0 at each update of the training phase.

  • The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

  • The last layer is the Dense layer is the regular deeply connected neural network layer.

  • As we are predicting a single value the units in the last layer is set to 1.

PYTHON
regressor = Sequential()

regressor.add(LSTM(units = 60, activation = 'relu', return_sequences = True, input_shape = (X_train.shape[1], 5)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 60, activation = 'relu', return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 80, activation = 'relu', return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 120, activation = 'relu'))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))
PYTHON
regressor.summary()
PYTHON
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm (LSTM)                  (None, 60, 60)            15840
_________________________________________________________________
dropout (Dropout)            (None, 60, 60)            0
_________________________________________________________________
lstm_1 (LSTM)                (None, 60, 60)            29040
_________________________________________________________________
dropout_1 (Dropout)          (None, 60, 60)            0
_________________________________________________________________
lstm_2 (LSTM)                (None, 60, 80)            45120
_________________________________________________________________
dropout_2 (Dropout)          (None, 60, 80)            0
_________________________________________________________________
lstm_3 (LSTM)                (None, 120)               96480
_________________________________________________________________
dropout_3 (Dropout)          (None, 120)               0
_________________________________________________________________
dense (Dense)                (None, 1)                 121
=================================================================
Total params: 186,601
Trainable params: 186,601
Non-trainable params: 0
_________________________________________________________________

Here we are compiling the model and fitting it to the training data. We will use 50 epochs to train the model. An epoch is an iteration over the entire data provided. batch_size is the number of samples per gradient update i.e. here the weights will be updates after 32 training examples.

PYTHON
regressor.compile(optimizer='adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs=50, batch_size=32)
OUTPUT
Train on 3557 samples

Epoch 45/50
3557/3557 [==============================] - 26s 7ms/sample - loss: 6.8088e-04
Epoch 46/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.0968e-04
Epoch 47/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.6604e-04
Epoch 48/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.2150e-04
Epoch 49/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.4292e-04
Epoch 50/50
3557/3557 [==============================] - 25s 7ms/sample - loss: 6.3066e-04

Prepare test dataset

These are the first 5 entries in the test data set. To predict opening on any day we need the data of previous 60 days.

PYTHON
data_test.head()
OUTPUT
DateOpenHighLowCloseAdj CloseVolume
36172019-01-021016.5700071052.3199461015.7100221045.8499761045.8499761532600
36182019-01-031041.0000001056.9799801014.0700071016.0599981016.0599981841100
36192019-01-041032.5899661070.8399661027.4179691070.7099611070.7099612093900
36202019-01-071071.5000001074.0000001054.7600101068.3900151068.3900151981900
36212019-01-081076.1099851084.5600591060.5300291076.2800291076.2800291764900

past_60_days contains the data of the past 60 days required to predict the opening of the 1st day in the test data set.

PYTHON
past_60_days = data_training.tail(60)

We are going to append data_test to past_60_days and ignore the index of data_test and drop Date and Adj Close.

PYTHON
df = past_60_days.append(data_test, ignore_index = True)
df = df.drop(['Date', 'Adj Close'], axis = 1)
df.head()
OUTPUT
OpenHighLowCloseVolume
01195.3299561197.5100101155.5760501168.1899412209500
11167.5000001173.5000001145.1199951157.3499761184300
21150.1099851168.0000001127.3640141148.9699711932400
31146.1500241154.3499761137.5720211138.8199461308700
41131.0799561132.1700441081.1300051081.2199712675700

Similar to the training data set we have to scale the test data so that all the values are in the range 0 to 1.

PYTHON
inputs = scaler.transform(df)
inputs
OUTPUT
array([[0.93805611, 0.93755773, 0.92220906, 0.91781776, 0.0266752 ],
       [0.91527437, 0.91792904, 0.91350452, 0.90892169, 0.01425359],
       [0.90103881, 0.91343268, 0.89872289, 0.90204445, 0.02331778],
       ...,
       [0.93940683, 0.93712442, 0.93529076, 0.9247443 , 0.01947328],
       [0.92550693, 0.93064972, 0.92791493, 0.9339358 , 0.01954719],
       [0.93524016, 0.94894575, 0.95017564, 0.95130949, 0.01227612]])

We have to prepare the test data like the training data.

PYTHON
X_test = []
y_test = []

for i in range(60, inputs.shape[0]):
    X_test.append(inputs[i-60:i])
    y_test.append(inputs[i, 0])

X_test, y_test = np.array(X_test), np.array(y_test)
X_test.shape, y_test.shape
OUTPUT
((192, 60, 5), (192,))

We are now going to predict the opening for X_test using predict()

PYTHON
y_pred = regressor.predict(X_test)

As we had scaled all the values down, now we will have to get them back to the original scale. scaler.scale_ gives the scaling level

PYTHON
scaler.scale_
OUTPUT
array([8.18605127e-04, 8.17521128e-04, 8.32487534e-04, 8.20673293e-04,
       1.21162775e-08])

8.186 is the first value in the list which gives the scale of opening price. We will multiply y_pred and y_test with the inverse of this to get all the values to the original scale.

PLAINTEXT
scale = 1/8.18605127e-04
scale
PLAINTEXT
1221.5901990069017
PYTHON
y_pred = y_pred*scale
y_test = y_test*scale

Visualization

PYTHON
# Visualising the results
plt.figure(figsize=(14,5))
plt.plot(y_test, color = 'red', label = 'Real Google Stock Price')
plt.plot(y_pred, color = 'blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()

Line chart comparing real Google stock prices in red against LSTM predicted prices in blue over 192 test days

As we can see from the graph we have got a decent prediction of the opening price with a good accuracy.

Conclusion

In this tutorial you built a four-layer stacked LSTM in TensorFlow to predict Google's daily opening stock price using a 60-day look-back window on data from 2004 to 2018. The model successfully captured the general upward trend in 2019 prices, though sharp intra-month volatility remained difficult to predict precisely.

Key takeaways:

  • A 60-day look-back window encodes enough historical context for the LSTM to detect medium-term trends; shorter windows lose context while longer ones require proportionally more training data.
  • Stacking four LSTM layers with Dropout(0.2) between them allows the model to learn increasingly abstract temporal representations while regularizing against overfitting.
  • MinMaxScaler must be fit on training data only and then used to transform test data — leaking test statistics into the scaler creates an unrealistically optimistic evaluation.

Next steps:

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments