Module Overview
This module frames the full LLM development lifecycle so that every later fine-tuning technique has a place in the pipeline. It distinguishes pre-training from post-training, explains why base models require adaptation, and dives into continued pre-training for domain adaptation alongside the modern multi-token prediction objective.
Learning Objectives
- Separate the pre-training and post-training phases and state what each contributes.
- Explain why a base model is not directly useful for instruction-following tasks.
- Recall the CLM, MLM, and Prefix-LM training objectives and where each applies.
- Describe data curation and filtering challenges at pre-training scale.
- Decide when Continued Pre-Training is warranted versus moving directly to SFT.
Topics Covered
Modern LLM Development Lifecycle
- Pre-training vs post-training — the two-phase view
- What pre-training produces (the base model)
- Why base models are not useful out of the box
- Training objectives recap: Causal LM (CLM), Masked LM (MLM), Prefix-LM
Pre-Training Deep Dive
- Data curation and filtering at scale
- Continued Pre-Training (CPT) for domain adaptation
- The compute budget problem (links back to scaling laws)
- When CPT is needed vs jumping straight to SFT
- Multi-Token Prediction (MTP)
Key Concepts & Terminology
Base model vs instruct model, domain shift, data mixture, deduplication, token budget, next-token vs multi-token objectives.
Tools & Frameworks Referenced
Large-scale pre-training corpora and data-filtering pipelines (conceptual); domain-adaptation workflows.
Prerequisites
Module 03 (scaling laws), general ML training fundamentals.