LSTM networks learn sequential patterns by maintaining a memory state across timesteps — making them capable of generating coherent text sequences. This tutorial trains a stacked LSTM in TensorFlow on a poetry corpus to generate new verses word-by-word through next-word prediction.
Sequence Generation Scheme
Let's Code
import tensorflow as tf
import string
import requests
import pandas as pd
response = requests.get('https://raw.githubusercontent.com/laxmimerit/poetry-data/master/adele.txt')
response.text
'Looking for some education\nMade my way into the night\nAll that bullshit conversation\nBaby, can\'t you read the signs? I won\'t bore you with the details, baby\nI don\'t even wanna waste your time\nLet\'s just say that maybe\nYou could help me ease my mind\nI ain\'t Mr. Right But if you\'re looking for fast love\nIf that\'s love in your eyes\nIt\'s more than enough\nHad some bad love\nSo fast love is all that I\'ve got on my mind Ooh,
data = response.text.splitlines()
len(data)
2400
len(" ".join(data))
91330
Build LSTM Model and Prepare X and y
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
token = Tokenizer()
token.fit_on_texts(data)
# token.word_counts
help(token)
token.word_index
{'i': 1, 'you': 2, 'the': 3, 'me': 4, 'to': 5, ...}
encoded_text = token.texts_to_sequences(data)
encoded_text
[[254, 21, 219, 725], [117, 8, 80, 153, 3, 133], [14, 10, 726, 727], ...]
x = ['i love you']
token.texts_to_sequences(x)
[[1, 11, 2]]
vocab_size = len(token.word_counts) + 1
Prepare Training Data
datalist = []
for d in encoded_text:
if len(d)>1:
for i in range(2, len(d)):
datalist.append(d[:i])
print(d[:i])
Padding
max_length = 20
sequences = pad_sequences(datalist, maxlen=max_length, padding='pre')
sequences
array([[ 0, 0, 0, ..., 0, 254, 21],
[ 0, 0, 0, ..., 254, 21, 219],
[ 0, 0, 0, ..., 0, 117, 8],
...,
[ 0, 0, 0, ..., 17, 198, 17],
[ 0, 0, 0, ..., 198, 17, 198],
[ 0, 0, 0, ..., 17, 198, 6]], dtype=int32)
X = sequences[:, :-1]
y = sequences[:, -1]
y = to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]
LSTM Model Training
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 19, 50) 69800
_________________________________________________________________
lstm (LSTM) (None, 19, 100) 60400
_________________________________________________________________
lstm_1 (LSTM) (None, 100) 80400
_________________________________________________________________
dense (Dense) (None, 100) 10100
_________________________________________________________________
dense_1 (Dense) (None, 1396) 140996
=================================================================
Total params: 361,696
Trainable params: 361,696
Non-trainable params: 0
_________________________________________________________________
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, batch_size=32, epochs=50)
Epoch 49/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5386 - accuracy: 0.8388
Epoch 50/50
445/445 [==============================] - 3s 6ms/step - loss: 0.5385 - accuracy: 0.8371
Poetry Generation
poetry_length = 10
def generate_poetry(seed_text, n_lines):
for i in range(n_lines):
text = []
for _ in range(poetry_length):
encoded = token.texts_to_sequences([seed_text])
encoded = pad_sequences(encoded, maxlen=seq_length, padding='pre')
y_pred = np.argmax(model.predict(encoded), axis=-1)
predicted_word = ""
for word, index in token.word_index.items():
if index == y_pred:
predicted_word = word
break
seed_text = seed_text + ' ' + predicted_word
text.append(predicted_word)
seed_text = text[-1]
text = ' '.join(text)
print(text)
seed_text = 'i love you'
generate_poetry(seed_text, 5)
is no and i want to do is wash your
name i set fire to the beat tears are gonna
understand last night she let the sky fall when it
was just like a song i was so scared to
make us grow from the arms of your love to
Watch the full NLP course: Introduction to NLP
Conclusion
In this tutorial you trained a stacked two-layer LSTM on an Adele poetry corpus to generate new verses word-by-word. After tokenizing 2,400 lines, building n-gram sequences padded to length 20, and training for 50 epochs with categorical cross-entropy, the model reached 83.7% training accuracy and produced thematically consistent lines — drawing on repeated lyrical patterns like "i set fire to the beat" and "the arms of your love."
Key takeaways:
- N-gram sequence preparation converts free-form text into a supervised next-word prediction task: each input is a partial sequence and the label is the next word, giving the model thousands of training examples from a small corpus.
pad_sequenceswithpadding='pre'left-pads shorter n-grams with zeros so all inputs are the same fixed length, which is required for batch training.- Stacking two LSTM layers (first with
return_sequences=True) allows the second layer to learn higher-level temporal patterns on top of the first layer's features, improving language modeling quality. softmaxon the final dense layer outputs a probability distribution over the full vocabulary; sampling from top-k tokens (rather than always takingargmax) introduces diversity and prevents repetitive output.
Next steps:
- Replace the word-level model with a character-level LSTM for finer-grained control over spelling and punctuation in Text Generation using TensorFlow, Keras and LSTM.
- Use pre-trained GloVe vectors in the Embedding layer instead of learning from scratch to improve generalization on small poetry datasets — see Words Embedding using GloVe Vectors.
- Apply temperature scaling during inference: dividing logits by a value < 1 makes the model more confident (less creative), while > 1 makes it more diverse (less coherent).
