Module Overview
This module establishes the representation layer of RAG. It surveys the full embedding taxonomy and the compression spectrum, introduces Matryoshka embeddings for adjustable dimensionality at query time, and covers fine-tuning strategies to align embeddings with a specific domain.
Learning Objectives
- Classify embedding types from dense to binary and explain their trade-offs.
- Describe multi-vector (late-interaction) embeddings and when they help.
- Explain Matryoshka Representation Learning and flexible query-time dimensions.
- Outline how to produce and use MRL embeddings in production.
- Describe embedding fine-tuning strategies for domain retrieval.
Topics Covered
Embedding Taxonomy & Types
- Advanced embedding taxonomy
- Dense embeddings — the baseline
- Sparse embeddings — preserving lexical signals
- Quantized embeddings — float32 to int8/uint8
- Binary embeddings — maximum compression
- Multi-vector embeddings — one document, many vectors
Matryoshka Embeddings (MRL)
- Matryoshka embeddings — flexible dimensions at query time
- How MRL training works
- Using MRL embeddings in production
Embedding Fine-Tuning
- Embedding fine-tuning strategies
- The embedding fine-tuning workflow
Key Concepts & Terminology
Dense vs sparse retrieval, scalar/binary quantization, late interaction, nested representation dimensions, contrastive embedding fine-tuning, hard negatives.
Tools & Frameworks Referenced
Sentence-Transformers-style embedding training, MRL-capable embedding models, multi-vector (ColBERT-style) embeddings.
Prerequisites
Modules 01–03 (Transformer foundations).