Module 07: Evaluation, Quantization and Deployment

Module Overview

This module closes the fine-tuning loop: how to measure whether fine-tuning worked, how to compress models for efficient serving, how to deploy them (including multiple LoRA adapters from one base model), and which frameworks accelerate the whole workflow.

Learning Objectives

Justify evaluation as an integral stage of fine-tuning.
Choose benchmark categories and apply LLM-as-judge methods responsibly.
Compare quantization formats by quality, speed, and hardware.
Select an inference framework and explain multi-adapter serving.
Match a fine-tuning framework to skill level and configurability needs.

Topics Covered

Evaluation

Why evaluation is part of the fine-tuning workflow
Benchmark types: knowledge, reasoning, instruction-following
LLM-as-judge: MT-Bench, Chatbot Arena
Domain-specific evaluation design

Quantization & Deployment Preparation

GPTQ, AWQ, BNB NF4, FP8
Merging LoRA adapters before serving
Inference frameworks: vLLM, SGLang
Serving multiple LoRA adapters from one base model
GGUF format and llama.cpp
Speculative decoding and inference acceleration

Tooling & Frameworks

Hugging Face TRL: SFT, DPO, GRPO, ORPO
Unsloth: consumer-GPU optimization
Axolotl: full YAML configurability
LLaMA-Factory: no-code web UI
Managed fine-tuning: Together AI, AWS SageMaker

Key Concepts & Terminology

Post-training quantization, weight-only vs activation-aware quantization, adapter hot-swapping, draft/verifier speculative decoding, judge bias and calibration.

Tools & Frameworks Referenced

vLLM, SGLang, llama.cpp, GPTQ, AWQ, bitsandbytes, GGUF, TRL, Unsloth, Axolotl, LLaMA-Factory, Together AI, AWS SageMaker.

Prerequisites

Module 06 (SFT/PEFT/alignment).

Module 07: Evaluation, Quantization and Deployment

Module Overview

Learning Objectives

Topics Covered

Evaluation

Quantization & Deployment Preparation

Tooling & Frameworks

Key Concepts & Terminology

Tools & Frameworks Referenced

Prerequisites

Found this useful? Keep building with me.

Latest recommendations you might like

Module 04: LLM Lifecycle and Pre-Training

Module 05: Datasets and Synthetic Data

Module 06: SFT, PEFT and Preference Alignment

Module 08: Mixture of Experts

Find this tutorial useful?

Discussion & Comments