Module 06: SFT, PEFT and Preference Alignment

Module Overview

This is the core fine-tuning module. It covers parameter-efficient methods that make adaptation feasible on modest GPUs, the supervised fine-tuning stage common to all post-training pipelines, and the preference-alignment techniques that shape model behaviour toward human preferences.

Learning Objectives

Explain the intrinsic-dimensionality rationale for low-rank adaptation.
Compare LoRA, QLoRA, DoRA, AdaLoRA, and LoRA+ by mechanism and use case.
Describe SFT as stage one of every post-training pipeline.
Articulate why SFT alone does not achieve alignment.
Contrast RLHF-with-PPO against DPO in complexity and components.

Topics Covered

Parameter-Efficient Fine-Tuning (PEFT)

The intrinsic dimensionality insight
LoRA: rank, alpha, and target modules
QLoRA: 4-bit NF4, double quantization, paged optimizers
DoRA: magnitude + direction decomposition
AdaLoRA: adaptive rank allocation
LoRA+: separate learning rates for the A and B matrices

Supervised Fine-Tuning (SFT)

SFT as stage 1 of every post-training pipeline
Instruction tuning (FLAN, Alpaca, OpenHermes)
Chat / conversational fine-tuning
Chain-of-Thought fine-tuning
Domain-specific fine-tuning best practices

Preference Alignment

Why SFT alone is not enough
RLHF with PPO: reward model, critic, and KL penalty
DPO: direct optimization without a separate reward model

Key Concepts & Terminology

Low-rank adapters, adapter merging, reward modelling, KL regularisation, preference pairs, reference model, reward hacking.

Tools & Frameworks Referenced

PEFT (LoRA/QLoRA/DoRA/AdaLoRA), bitsandbytes (NF4), Hugging Face TRL (SFT and DPO trainers).

Prerequisites

Modules 04-05.

Module 06: SFT, PEFT and Preference Alignment

Module Overview

Learning Objectives

Topics Covered

Parameter-Efficient Fine-Tuning (PEFT)

Supervised Fine-Tuning (SFT)

Preference Alignment

Key Concepts & Terminology

Tools & Frameworks Referenced

Prerequisites

Found this useful? Keep building with me.

Latest recommendations you might like

Module 04: LLM Lifecycle and Pre-Training

Module 05: Datasets and Synthetic Data

Module 07: Evaluation, Quantization and Deployment

Module 08: Mixture of Experts

Find this tutorial useful?

Discussion & Comments