Poetry Generation with TensorFlow and LSTM

In this blog, we will train a stacked LSTM in TensorFlow on a poetry corpus to generate new verses word by word. LSTM networks learn sequence patterns by keeping a memory state across timesteps, which lets them generate coherent text one word at a time.

Sequence Generation Scheme

Code

PYTHON

import tensorflow as tf
import string
import requests
import pandas as pd

PYTHON

response = requests.get('https://raw.githubusercontent.com/laxmimerit/poetry-data/master/adele.txt')

PYTHON

response.text

OUTPUT

'Looking for some education\nMade my way into the night\nAll that bullshit conversation\nBaby, can\'t you read the signs? I won\'t bore you with the details, baby\nI don\'t even wanna waste your time\nLet\'s just say that maybe\nYou could help me ease my mind\nI ain\'t Mr. Right But if you\'re looking for fast love\nIf that\'s love in your eyes\nIt\'s more than enough\nHad some bad love\nSo fast love is all that I\'ve got on my mind Ooh,

PYTHON

data = response.text.splitlines()
len(data)

OUTPUT

PYTHON

len(" ".join(data))

OUTPUT

Build LSTM Model and Prepare X and y

PYTHON

import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences

PYTHON

token = Tokenizer()
token.fit_on_texts(data)

PYTHON

# token.word_counts

PYTHON

help(token)

PYTHON

token.word_index

OUTPUT

{'i': 1, 'you': 2, 'the': 3, 'me': 4, 'to': 5, ...}

PYTHON

encoded_text = token.texts_to_sequences(data)
encoded_text

OUTPUT

[[254, 21, 219, 725], [117, 8, 80, 153, 3, 133], [14, 10, 726, 727], ...]

PYTHON

x = ['i love you']
token.texts_to_sequences(x)

OUTPUT

[[1, 11, 2]]

PYTHON

vocab_size = len(token.word_counts) + 1

Prepare Training Data

PYTHON

datalist = []
for d in encoded_text:
  if len(d)>1:
    for i in range(2, len(d)):
      datalist.append(d[:i])
      print(d[:i])

Padding

PYTHON

max_length = 20
sequences = pad_sequences(datalist, maxlen=max_length, padding='pre')
sequences

OUTPUT

array([[  0,   0,   0, ...,   0, 254,  21],
       [  0,   0,   0, ..., 254,  21, 219],
       [  0,   0,   0, ...,   0, 117,   8],
       ...,
       [  0,   0,   0, ...,  17, 198,  17],
       [  0,   0,   0, ..., 198,  17, 198],
       [  0,   0,   0, ...,  17, 198,   6]], dtype=int32)

PYTHON

X = sequences[:, :-1]
y = sequences[:, -1]

PYTHON

y = to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]

LSTM Model Training

PYTHON

model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))

PYTHON

model.summary()

PYTHON

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding (Embedding)        (None, 19, 50)            69800
_________________________________________________________________
lstm (LSTM)                  (None, 19, 100)           60400
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               80400
_________________________________________________________________
dense (Dense)                (None, 100)               10100
_________________________________________________________________
dense_1 (Dense)              (None, 1396)              140996
=================================================================
Total params: 361,696
Trainable params: 361,696
Non-trainable params: 0
_________________________________________________________________

PYTHON

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

PYTHON

model.fit(X, y, batch_size=32, epochs=50)

OUTPUT

Epoch 49/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5386 - accuracy: 0.8388
Epoch 50/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5385 - accuracy: 0.8371

Poetry Generation

PYTHON

poetry_length = 10
def generate_poetry(seed_text, n_lines):
  for i in range(n_lines):
    text = []
    for _ in range(poetry_length):
      encoded = token.texts_to_sequences([seed_text])
      encoded = pad_sequences(encoded, maxlen=seq_length, padding='pre')

      y_pred = np.argmax(model.predict(encoded), axis=-1)

      predicted_word = ""
      for word, index in token.word_index.items():
        if index == y_pred:
          predicted_word = word
          break

      seed_text = seed_text + ' ' + predicted_word
      text.append(predicted_word)

    seed_text = text[-1]
    text = ' '.join(text)
    print(text)

PYTHON

seed_text = 'i love you'
generate_poetry(seed_text, 5)

OUTPUT

is no and i want to do is wash your
name i set fire to the beat tears are gonna
understand last night she let the sky fall when it
was just like a song i was so scared to
make us grow from the arms of your love to

Watch the full NLP course: Introduction to NLP

Conclusion

In this blog, we trained a stacked two-layer LSTM on an Adele poetry corpus to generate new verses word by word. We tokenized 2,400 lines, built n-gram sequences padded to length 20, and trained for 50 epochs with categorical cross-entropy. The model reached 83.7% training accuracy. It produced lines that stayed on theme, drawing on repeated patterns like "i set fire to the beat" and "the arms of your love."

Key takeaways:

N-gram sequence preparation turns free-form text into a supervised next-word task. Each input is a partial sequence, and the label is the next word. This gives the model thousands of training examples from a small corpus.
pad_sequences with padding='pre' left-pads shorter n-grams with zeros. This makes all inputs the same fixed length, which batch training needs.
Stacking two LSTM layers (the first with return_sequences=True) lets the second layer learn higher-level patterns on top of the first layer's features. This improves the quality of the language model.
softmax on the final dense layer outputs a probability spread over the full vocabulary. Sampling from the top-k tokens, rather than always taking argmax, adds variety and prevents repetitive output.

Next steps:

Replace the word-level model with a character-level LSTM for finer-grained control over spelling and punctuation in Text Generation using TensorFlow, Keras and LSTM.
Use pre-trained GloVe vectors in the Embedding layer instead of learning from scratch. This helps the model generalize on small poetry datasets. See Words Embedding using GloVe Vectors.
Apply temperature scaling during inference. Dividing logits by a value below 1 makes the model more confident and less creative. A value above 1 makes it more diverse but less coherent.

Poetry Generation with TensorFlow and LSTM

Sequence Generation Scheme

Code

Build LSTM Model and Prepare X and y

Prepare Training Data

Padding

LSTM Model Training

Poetry Generation

Conclusion

Found this useful? Keep building with me.

Latest recommendations you might like

Text Generation using Tensorflow, Keras and LSTM

Airline Passenger Prediction using RNN - LSTM

Human Activity Recognition with CNN

Multi-Step Time Series Prediction with LSTM

Find this tutorial useful?

Discussion & Comments