Poetry Generation with TensorFlow and LSTM

LSTM networks learn sequential patterns by maintaining a memory state across timesteps — making them capable of generating coherent text sequences. This tutorial trains a stacked LSTM in TensorFlow on a poetry corpus to generate new verses word-by-word through next-word prediction.

Sequence Generation Scheme

Let's Code

PYTHON

import tensorflow as tf
import string
import requests
import pandas as pd

PYTHON

response = requests.get('https://raw.githubusercontent.com/laxmimerit/poetry-data/master/adele.txt')

PYTHON

response.text

OUTPUT

'Looking for some education\nMade my way into the night\nAll that bullshit conversation\nBaby, can\'t you read the signs? I won\'t bore you with the details, baby\nI don\'t even wanna waste your time\nLet\'s just say that maybe\nYou could help me ease my mind\nI ain\'t Mr. Right But if you\'re looking for fast love\nIf that\'s love in your eyes\nIt\'s more than enough\nHad some bad love\nSo fast love is all that I\'ve got on my mind Ooh,

PYTHON

data = response.text.splitlines()
len(data)

OUTPUT

PYTHON

len(" ".join(data))

OUTPUT

Build LSTM Model and Prepare X and y

PYTHON

import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences

PYTHON

token = Tokenizer()
token.fit_on_texts(data)

PYTHON

# token.word_counts

PYTHON

help(token)

PYTHON

token.word_index

OUTPUT

{'i': 1, 'you': 2, 'the': 3, 'me': 4, 'to': 5, ...}

PYTHON

encoded_text = token.texts_to_sequences(data)
encoded_text

OUTPUT

[[254, 21, 219, 725], [117, 8, 80, 153, 3, 133], [14, 10, 726, 727], ...]

PYTHON

x = ['i love you']
token.texts_to_sequences(x)

OUTPUT

[[1, 11, 2]]

PYTHON

vocab_size = len(token.word_counts) + 1

Prepare Training Data

PYTHON

datalist = []
for d in encoded_text:
  if len(d)>1:
    for i in range(2, len(d)):
      datalist.append(d[:i])
      print(d[:i])

Padding

PYTHON

max_length = 20
sequences = pad_sequences(datalist, maxlen=max_length, padding='pre')
sequences

OUTPUT

array([[  0,   0,   0, ...,   0, 254,  21],
       [  0,   0,   0, ..., 254,  21, 219],
       [  0,   0,   0, ...,   0, 117,   8],
       ...,
       [  0,   0,   0, ...,  17, 198,  17],
       [  0,   0,   0, ..., 198,  17, 198],
       [  0,   0,   0, ...,  17, 198,   6]], dtype=int32)

PYTHON

X = sequences[:, :-1]
y = sequences[:, -1]

PYTHON

y = to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]

LSTM Model Training

PYTHON

model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))

PYTHON

model.summary()

PYTHON

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding (Embedding)        (None, 19, 50)            69800
_________________________________________________________________
lstm (LSTM)                  (None, 19, 100)           60400
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               80400
_________________________________________________________________
dense (Dense)                (None, 100)               10100
_________________________________________________________________
dense_1 (Dense)              (None, 1396)              140996
=================================================================
Total params: 361,696
Trainable params: 361,696
Non-trainable params: 0
_________________________________________________________________

PYTHON

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

PYTHON

model.fit(X, y, batch_size=32, epochs=50)

OUTPUT

Epoch 49/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5386 - accuracy: 0.8388
Epoch 50/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5385 - accuracy: 0.8371

Poetry Generation

PYTHON

poetry_length = 10
def generate_poetry(seed_text, n_lines):
  for i in range(n_lines):
    text = []
    for _ in range(poetry_length):
      encoded = token.texts_to_sequences([seed_text])
      encoded = pad_sequences(encoded, maxlen=seq_length, padding='pre')

      y_pred = np.argmax(model.predict(encoded), axis=-1)

      predicted_word = ""
      for word, index in token.word_index.items():
        if index == y_pred:
          predicted_word = word
          break

      seed_text = seed_text + ' ' + predicted_word
      text.append(predicted_word)

    seed_text = text[-1]
    text = ' '.join(text)
    print(text)

PYTHON

seed_text = 'i love you'
generate_poetry(seed_text, 5)

OUTPUT

is no and i want to do is wash your
name i set fire to the beat tears are gonna
understand last night she let the sky fall when it
was just like a song i was so scared to
make us grow from the arms of your love to

Watch the full NLP course: Introduction to NLP

Conclusion

In this tutorial you trained a stacked two-layer LSTM on an Adele poetry corpus to generate new verses word-by-word. After tokenizing 2,400 lines, building n-gram sequences padded to length 20, and training for 50 epochs with categorical cross-entropy, the model reached 83.7% training accuracy and produced thematically consistent lines — drawing on repeated lyrical patterns like "i set fire to the beat" and "the arms of your love."

Key takeaways:

N-gram sequence preparation converts free-form text into a supervised next-word prediction task: each input is a partial sequence and the label is the next word, giving the model thousands of training examples from a small corpus.
pad_sequences with padding='pre' left-pads shorter n-grams with zeros so all inputs are the same fixed length, which is required for batch training.
Stacking two LSTM layers (first with return_sequences=True) allows the second layer to learn higher-level temporal patterns on top of the first layer's features, improving language modeling quality.
softmax on the final dense layer outputs a probability distribution over the full vocabulary; sampling from top-k tokens (rather than always taking argmax) introduces diversity and prevents repetitive output.

Next steps:

Replace the word-level model with a character-level LSTM for finer-grained control over spelling and punctuation in Text Generation using TensorFlow, Keras and LSTM.
Use pre-trained GloVe vectors in the Embedding layer instead of learning from scratch to improve generalization on small poetry datasets — see Words Embedding using GloVe Vectors.
Apply temperature scaling during inference: dividing logits by a value < 1 makes the model more confident (less creative), while > 1 makes it more diverse (less coherent).

Poetry Generation with TensorFlow and LSTM

Topics You Will Master

Sequence Generation Scheme

Let's Code

Build LSTM Model and Prepare X and y

Prepare Training Data

Padding

LSTM Model Training

Poetry Generation

Conclusion

Latest recommendations you might like

IMDB Sentiment Classification with LSTM

Sentiment Classification Using BERT

Find this tutorial useful?

Discussion & Comments