Module 16: RAG Basics — Chunking and Retrieval

RAG foundations — build a baseline system, choose embedding models and chunking strategies, and add hybrid retrieval with BM25, SPLADE, ColBERT.

May 28, 20261 min readFollow

Topics You Will Master

Building a baseline (vanilla) RAG system from first principles
Choosing embedding models for retrieval
Chunking strategies and their effect on retrieval quality
Sparse retrieval with BM25 and learned-sparse SPLADE

Module Overview

This module covers the core RAG building blocks. It starts from a vanilla pipeline, then examines the decisions that most affect quality — embedding model selection and chunking — before introducing hybrid retrieval that combines lexical and dense signals with multi-vector late interaction.

Learning Objectives

  • Describe the components of a vanilla RAG pipeline.
  • Select an embedding model appropriate to a corpus and query type.
  • Choose chunking strategies and explain their impact on context quality.
  • Compare BM25, SPLADE, and ColBERT-style retrieval.

Topics Covered

Foundations of RAG

  • Vanilla RAG
  • Embedding models for retrieval
  • Chunking strategies
  • BM25 (lexical sparse retrieval)
  • SPLADE (learned sparse retrieval)
  • Multi-vector ColBERT (late-interaction retrieval)

Key Concepts & Terminology

Retrieve-then-generate, fixed vs recursive vs semantic chunking, chunk overlap, lexical vs dense vs hybrid retrieval, MaxSim late interaction.

Tools & Frameworks Referenced

BM25, SPLADE, ColBERT-style multi-vector retrieval, vector stores.

Prerequisites

Module 14 (embeddings) and Module 15 (LangChain for orchestration).

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments