Module 04: LLM Lifecycle and Pre-Training

Module Overview

This module frames the full LLM development lifecycle so that every later fine-tuning technique has a place in the pipeline. It distinguishes pre-training from post-training, explains why base models require adaptation, and covers continued pre-training for domain adaptation alongside the modern multi-token prediction objective.

Learning Objectives

Separate the pre-training and post-training phases and state what each contributes.
Explain why a base model is not directly useful for instruction-following tasks.
Recall the CLM, MLM, and Prefix-LM training objectives and where each applies.
Describe data curation and filtering challenges at pre-training scale.
Decide when Continued Pre-Training is warranted versus moving directly to SFT.

Topics Covered

Modern LLM Development Lifecycle

Pre-training vs post-training: the two-phase view
What pre-training produces (the base model)
Why base models are not useful out of the box
Training objectives recap: Causal LM (CLM), Masked LM (MLM), Prefix-LM

Pre-Training Deep Dive

Data curation and filtering at scale
Continued Pre-Training (CPT) for domain adaptation
The compute budget problem (links back to scaling laws)
When CPT is needed vs jumping straight to SFT
Multi-Token Prediction (MTP)

Key Concepts & Terminology

Base model vs instruct model, domain shift, data mixture, deduplication, token budget, next-token vs multi-token objectives.

Tools & Frameworks Referenced

Large-scale pre-training corpora and data-filtering pipelines (conceptual); domain-adaptation workflows.

Prerequisites

Module 03 (scaling laws), general ML training fundamentals.

Module 04: LLM Lifecycle and Pre-Training

Module Overview

Learning Objectives

Topics Covered

Modern LLM Development Lifecycle

Pre-Training Deep Dive

Key Concepts & Terminology

Tools & Frameworks Referenced

Prerequisites

Found this useful? Keep building with me.

Latest recommendations you might like

Module 05: Datasets and Synthetic Data

Module 06: SFT, PEFT and Preference Alignment

Module 07: Evaluation, Quantization and Deployment

Module 08: Mixture of Experts

Find this tutorial useful?

Discussion & Comments