Summarization is a generation task — the model writes new text rather than picking a label. T5 (Text-to-Text Transfer Transformer) is ideal for it: it treats every NLP problem as text-in, text-out, using the full encoder-decoder transformer. In this tutorial you fine-tune t5-small to summarize chat-style dialogues.
Unlike the encoder-only BERT tutorials, this one uses a sequence-to-sequence setup with a dedicated data collator and trainer.
Prerequisites: Familiarity with the transformer encoder-decoder architecture and a Python environment with transformers, datasets, and torch.
Extractive vs. Abstractive Summarization
Summarization in NLP automatically generates a concise version of a longer text.
There are two flavors:
- Extractive — copies the most important sentences verbatim from the source.
- Abstractive — generates new sentences that capture the meaning, the way a human would paraphrase.
T5 and BART produce abstractive summaries, which read more naturally. The CNN/DailyMail dataset — around 300,000 news article/summary pairs — is the classic benchmark, and its summaries are abstractive.

Abstractive summarization encodes a long document and decodes a short, reworded summary.
Trying Pretrained Summarizers
Load a few examples from CNN/DailyMail and compare two pretrained models:
from datasets import load_dataset
dataset = load_dataset("cnn_dailymail", '3.0.0', split="train[:10]")
print(dataset[0]['article'])
print("\nSummary:\n")
print(dataset[0]['highlights'])
LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday...
Summary:
Harry Potter star Daniel Radcliffe gets £20M fortune as he turns 18 Monday .
Young actor says he has no plans to fritter his cash away .
Radcliffe's earnings from first five Potter films have been held in trust fund .
Run both t5-small-finetuned-cnn and facebook/bart-large-cnn on the same article:
from transformers import pipeline
import torch
device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
summary = {}
pipe = pipeline('summarization', model='ubikpt/t5-small-finetuned-cnn', device=device)
summary['t5-small'] = pipe(dataset[0]['article'])[0]['summary_text']
pipe = pipeline('summarization', model='facebook/bart-large-cnn', device=device)
summary['bart-large'] = pipe(dataset[0]['article'])[0]['summary_text']
for model in summary:
print(f"\n{model}\n{summary[model]}")
t5-small
Harry Potter star Daniel Radcliffe says he has no plans to fritter his cash away . The actor has filmed a TV movie about author Rudyard Kipling
bart-large
Harry Potter star Daniel Radcliffe turns 18 on Monday. He gains access to a reported £20 million ($41.1 million) fortune. Radcliffe's earnings from the first five Potter films have been held in a trust fund. Details of how he'll mark his landmark birthday are under wraps.
Note
BART produces a longer, more detailed summary here; T5-small is more terse. Which is "better" depends on your use case — that is exactly why fine-tuning on your own data matters.
The SAMSum Dataset
To customize summarization, fine-tune on the SAMSum dataset — messenger-style dialogues paired with human-written summaries:
samsum = load_dataset('samsum', trust_remote_code=True)
samsum['train'][0]
{'id': '13818513', 'dialogue': "Amanda: I baked cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)", 'summary': 'Amanda baked cookies and will bring Jerry some tomorrow.'}
Inspect dialogue and summary lengths to pick a sensible maximum token length:
import pandas as pd
dialogue_len = [len(x['dialogue'].split()) for x in samsum['train']]
summary_len = [len(x['summary'].split()) for x in samsum['train']]
data = pd.DataFrame([dialogue_len, summary_len]).T
data.columns = ['Dialogue Length', 'Summary Length']
data.hist(figsize=(10, 3))
Loading and Tokenizing T5
Load the t5-small tokenizer and the sequence-to-sequence model:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_ckpt = 't5-small'
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSeq2SeqLM.from_pretrained(model_ckpt).to(device)
The key to seq2seq tokenization is text_target, which tokenizes the labels (summaries) separately from the inputs (dialogues):
def tokenize(batch):
encoding = tokenizer(batch['dialogue'], text_target=batch['summary'], max_length=200, truncation=True, padding=True, return_tensors='pt')
return encoding
samsum_pt = samsum.map(tokenize, batched=True, batch_size=None)
The tokenized dataset now carries input_ids, attention_mask, and labels for the train, test, and validation splits.

Seq2seq tokenization encodes the dialogue as input_ids and the summary as labels via text_target.
Training
Sequence-to-sequence models need DataCollatorForSeq2Seq, which dynamically pads inputs and labels and prepares decoder inputs:
from transformers import DataCollatorForSeq2Seq, TrainingArguments, Trainer
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)
args = TrainingArguments(
output_dir="train_dir",
num_train_epochs=2,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
eval_strategy='epoch',
save_strategy='epoch',
weight_decay=0.01,
learning_rate=2e-5,
gradient_accumulation_steps=500
)
trainer = Trainer(
model=model,
args=args,
tokenizer=tokenizer,
data_collator=data_collator,
train_dataset=samsum_pt['train'],
eval_dataset=samsum_pt['validation']
)
trainer.train()
{'eval_loss': 14.6737, 'epoch': 0.95}
{'eval_loss': 13.8082, 'epoch': 1.9}
{'train_runtime': 550.3847, 'train_loss': 14.087, 'epoch': 1.9}
Warning
The large gradient_accumulation_steps=500 means the optimizer only updates 14 times over the whole run, so this is a quick workflow demonstration, not a fully converged model. For real training, lower gradient_accumulation_steps (e.g. 1–8) and increase epochs so the loss drops meaningfully.
Prediction
Save the model and summarize a brand-new dialogue:
from transformers import pipeline
trainer.save_model("t5_samsum_summarization")
pipe = pipeline('summarization', model='t5_samsum_summarization', device=device)
custom_dialogue = """
Laxmi Kant: what work you planning to give Tom?
Juli: i was hoping to send him on a business trip first.
Laxmi Kant: cool. is there any suitable work for him?
Juli: he did excellent in last quarter. i will assign new project, once he is back.
"""
output = pipe(custom_dialogue)
output
[{'summary_text': 'laxmi Kant: i was hoping to send him on a business trip first . i will assign new project once he is back .'}]
Even from the short demo run, the model captures the gist of the conversation — the business trip and the new project.

The fine-tuned T5 pipeline condenses a multi-turn dialogue into a single-sentence summary.
Summary
You fine-tuned t5-small for abstractive dialogue summarization on SAMSum. The new pieces compared to classification are the encoder-decoder model (AutoModelForSeq2SeqLM), the text_target tokenization that handles labels, and the DataCollatorForSeq2Seq that prepares decoder inputs.
Next, you leave text behind and apply the same fine-tuning recipe to images — fine-tuning a Vision Transformer for image classification.