Poetry Generation Using Tensorflow, Keras, and LSTM

Published by on

What is RNN

Recurrent Neural Networks are the first of its kind State of the Art algorithms that can Memorize/remember previous inputs in memory, When a huge set of Sequential data is given to it. Recurrent Neural Networks are the first of its kind State of the Art algorithms that can Memorize/remember previous inputs in memory, When a huge set of Sequential data is given to it.

image.png

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

Different types of RNN’s

Different types of Recurrent Neural Networks.

  • Image Classification
  • Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
  • Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment).
  • Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French).
  • Synced sequence input and output (e.g. video classification where we wish to label each frame of the video)
image.png

The Problem of RNN's or Long-Term Dependencies

  • Vanishing Gradient
  • Exploding Gradient

Vanishing Gradient

If the partial derivation of Error is less than 1, then when it get multiplied with the Learning rate which is also very less. then Multiplying learning rate with partial derivation of Error wont be a big change when compared with previous iteration.

image.png
image.png

Exploding Gradient

We speak of Exploding Gradients when the algorithm assigns a stupidly high importance to the weights, without much reason. But fortunately, this problem can be easily solved if you truncate or squash the gradients

image.png

Long Short Term Memory (LSTM) Networks

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

image.png
image.png
image.png
image.png

Sequence Generation Scheme

alt text

Let's Code

import tensorflow as tf
import string
import requests
import pandas as pd
response = requests.get('https://raw.githubusercontent.com/laxmimerit/poetry-data/master/adele.txt')
response.text
'Looking for some education\nMade my way into the night\nAll that bullshit conversation\nBaby, can\'t you read the signs? I won\'t bore you with the details, baby\nI don\'t even wanna waste your time\nLet\'s just say that maybe\nYou could help me ease my mind\nI ain\'t Mr. Right But if you\'re looking for fast love\nIf that\'s love in your eyes\nIt\'s more than enough\nHad some bad love\nSo fast love is all that I\'ve got on my mind Ooh, 
data = response.text.splitlines()
len(data)
2400
len(" ".join(data))
91330

Build LSTM Model and Prepare X and y

import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
token = Tokenizer()
token.fit_on_texts(data)
# token.word_counts
help(token)
token.word_index
{'i': 1,
 'you': 2,
 'the': 3,
 'me': 4,
 'to': 5,
 ...}
encoded_text = token.texts_to_sequences(data)
encoded_text
[[254, 21, 219, 725],
 [117, 8, 80, 153, 3, 133],
 [14, 10, 726, 727],
 ...]
x = ['i love you']
token.texts_to_sequences(x)
[[1, 11, 2]]
vocab_size = len(token.word_counts) + 1

Prepare Training Data

datalist = []
for d in encoded_text:
  if len(d)>1:
    for i in range(2, len(d)):
      datalist.append(d[:i])
      print(d[:i])

Padding

max_length = 20
sequences = pad_sequences(datalist, maxlen=max_length, padding='pre')
sequences
array([[  0,   0,   0, ...,   0, 254,  21],
       [  0,   0,   0, ..., 254,  21, 219],
       [  0,   0,   0, ...,   0, 117,   8],
       ...,
       [  0,   0,   0, ...,  17, 198,  17],
       [  0,   0,   0, ..., 198,  17, 198],
       [  0,   0,   0, ...,  17, 198,   6]], dtype=int32)
X = sequences[:, :-1]
y = sequences[:, -1]
y = to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]

LSTM Model Training

model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 19, 50)            69800     
_________________________________________________________________
lstm (LSTM)                  (None, 19, 100)           60400     
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense (Dense)                (None, 100)               10100     
_________________________________________________________________
dense_1 (Dense)              (None, 1396)              140996    
=================================================================
Total params: 361,696
Trainable params: 361,696
Non-trainable params: 0
_________________________________________________________________
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, batch_size=32, epochs=50)
Epoch 49/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5386 - accuracy: 0.8388
Epoch 50/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5385 - accuracy: 0.8371

Poetry Generation

poetry_length = 10
def generate_poetry(seed_text, n_lines):
  for i in range(n_lines):
    text = []
    for _ in range(poetry_length):
      encoded = token.texts_to_sequences([seed_text])
      encoded = pad_sequences(encoded, maxlen=seq_length, padding='pre')

      y_pred = np.argmax(model.predict(encoded), axis=-1)

      predicted_word = ""
      for word, index in token.word_index.items():
        if index == y_pred:
          predicted_word = word
          break

      seed_text = seed_text + ' ' + predicted_word
      text.append(predicted_word)

    seed_text = text[-1]
    text = ' '.join(text)
    print(text)
seed_text = 'i love you'
generate_poetry(seed_text, 5)
is no and i want to do is wash your
name i set fire to the beat tears are gonna
understand last night she let the sky fall when it
was just like a song i was so scared to
make us grow from the arms of your love to

Watch Full Course Here: http://bitly.com/nlp_intro