Poetry Generation Using Tensorflow, Keras, and LSTM

Published by on 9 August 20209 August 2020

What is RNN

Recurrent Neural Networks are the first of its kind State of the Art algorithms that can Memorize/remember previous inputs in memory, When a huge set of Sequential data is given to it. Recurrent Neural Networks are the first of its kind State of the Art algorithms that can Memorize/remember previous inputs in memory, When a huge set of Sequential data is given to it.

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

Different types of RNN’s

Different types of Recurrent Neural Networks.

Image Classification
Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment).
Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French).
Synced sequence input and output (e.g. video classification where we wish to label each frame of the video)

The Problem of RNN's or Long-Term Dependencies

Vanishing Gradient
Exploding Gradient

Vanishing Gradient

If the partial derivation of Error is less than 1, then when it get multiplied with the Learning rate which is also very less. then Multiplying learning rate with partial derivation of Error wont be a big change when compared with previous iteration.

Exploding Gradient

We speak of Exploding Gradients when the algorithm assigns a stupidly high importance to the weights, without much reason. But fortunately, this problem can be easily solved if you truncate or squash the gradients

Long Short Term Memory (LSTM) Networks

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

Sequence Generation Scheme

Let's Code

import tensorflow as tf
import string
import requests
import pandas as pd

response = requests.get('https://raw.githubusercontent.com/laxmimerit/poetry-data/master/adele.txt')

response.text

'Looking for some education\nMade my way into the night\nAll that bullshit conversation\nBaby, can\'t you read the signs? I won\'t bore you with the details, baby\nI don\'t even wanna waste your time\nLet\'s just say that maybe\nYou could help me ease my mind\nI ain\'t Mr. Right But if you\'re looking for fast love\nIf that\'s love in your eyes\nIt\'s more than enough\nHad some bad love\nSo fast love is all that I\'ve got on my mind Ooh,

data = response.text.splitlines()
len(data)

len(" ".join(data))

Build LSTM Model and Prepare X and y

import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences

token = Tokenizer()
token.fit_on_texts(data)

# token.word_counts

help(token)

token.word_index

{'i': 1,
 'you': 2,
 'the': 3,
 'me': 4,
 'to': 5,
 ...}

encoded_text = token.texts_to_sequences(data)
encoded_text

[[254, 21, 219, 725],
 [117, 8, 80, 153, 3, 133],
 [14, 10, 726, 727],
 ...]

x = ['i love you']
token.texts_to_sequences(x)

[[1, 11, 2]]

vocab_size = len(token.word_counts) + 1

Prepare Training Data

datalist = []
for d in encoded_text:
  if len(d)>1:
    for i in range(2, len(d)):
      datalist.append(d[:i])
      print(d[:i])

Padding

max_length = 20
sequences = pad_sequences(datalist, maxlen=max_length, padding='pre')
sequences

array([[  0,   0,   0, ...,   0, 254,  21],
       [  0,   0,   0, ..., 254,  21, 219],
       [  0,   0,   0, ...,   0, 117,   8],
       ...,
       [  0,   0,   0, ...,  17, 198,  17],
       [  0,   0,   0, ..., 198,  17, 198],
       [  0,   0,   0, ...,  17, 198,   6]], dtype=int32)

X = sequences[:, :-1]
y = sequences[:, -1]

y = to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]

LSTM Model Training

model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 19, 50)            69800     
_________________________________________________________________
lstm (LSTM)                  (None, 19, 100)           60400     
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense (Dense)                (None, 100)               10100     
_________________________________________________________________
dense_1 (Dense)              (None, 1396)              140996    
=================================================================
Total params: 361,696
Trainable params: 361,696
Non-trainable params: 0
_________________________________________________________________

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X, y, batch_size=32, epochs=50)

Epoch 49/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5386 - accuracy: 0.8388
Epoch 50/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5385 - accuracy: 0.8371