GenAI Syllabus
A structured curriculum for Production LLM Engineering — Transformer foundations, fine-tuning and alignment, multimodal and speech AI, RAG and retrieval engineering, agentic systems, and prompt/context engineering. Each module lists topics, learning objectives, and the tools and frameworks referenced — concept-first, not a coding tutorial.
Transformers and Architecture
Attention, tokenization, encoder/decoder families, fast inference, and scaling laws.
Module 01: Transformers and Tokenization
Foundational module on Transformer mechanics and tokenization — embeddings, the attention family, encoder/decoder architectures, and subword tokenization.
Module 02: Hands-On Fine-Tuning of Transformers
Hands-on fine-tuning of the three Transformer families — implement attention conceptually and adapt encoder, decoder, and encoder-decoder models.
Module 03: Fast Inference and Scaling Laws
Inference efficiency for Transformers — KV caching, Flash Attention, MQA/GQA/MLA/PagedAttention, RoPE encoding, and Chinchilla scaling laws.
LLM Fine-Tuning and Alignment
Pretraining lifecycle, SFT, PEFT, preference alignment, quantization, MoE, reasoning models, and SLMs.
Module 04: LLM Lifecycle and Pre-Training
The two-phase LLM lifecycle — pre-training vs post-training, why base models need adaptation, continued pre-training, and multi-token prediction.
Module 05: Datasets and Synthetic Data
Preparing fine-tuning data — dataset formats, chat templates, loss masking, deduplication, and synthetic data with self-instruct and LLM-as-judge.
Module 06: SFT, PEFT and Preference Alignment
Adapting and aligning LLMs — PEFT (LoRA, QLoRA, DoRA, AdaLoRA), supervised fine-tuning, and preference alignment with RLHF and DPO.
Module 07: Evaluation, Quantization and Deployment
Post-fine-tuning workflows — benchmark and LLM-as-judge evaluation, quantization (GPTQ, AWQ, NF4, FP8, GGUF), and serving with vLLM and llama.cpp.
Module 08: Mixture of Experts
Mixture of Experts — why dense models hit scaling limits, MoE routing, load balancing against expert collapse, and when MoE beats dense.
Module 09: Reasoning Models and Chain-of-Thought
Reasoning models — what sets them apart from standard LLMs, chain-of-thought training, RL-only reasoning (GRPO, DeepSeek-R1-Zero), and distillation.
Module 10: Small Language Models and Distillation
Small Language Models and distillation — why SLMs win on cost, latency, and privacy; student-teacher training, soft labels, and KL divergence.
Vision and Speech
CNNs to ViT, visual language models, and speech-to-text with Whisper.
Module 11: Vision Models — CNNs to ViT
Vision foundations for multimodal AI — from CNNs to Vision Transformers: patch embedding, CLS token, and CLIP, SigLIP, DINOv2 encoders.
Module 12: Visual Language Models
Visual Language Models — the three-part VLM architecture (visual encoder, projector, LLM backbone) and vision-language alignment training.
Module 13: Speech-to-Text with Whisper
Speech AI and Speech-to-Text — the STT landscape, Whisper architecture and API, production STT pipelines, and fine-tuning Whisper on domain audio.
RAG and Retrieval
Embeddings, LangChain RAG, advanced RAG patterns, vector quantization, multimodal RAG, graph RAG and security.
Module 14: Embedding Models and Matryoshka Tuning
Embedding models — the taxonomy from dense to binary, multi-vector embeddings, Matryoshka Representation Learning, and domain fine-tuning.
Module 15: LangChain for Production RAG
LangChain for production RAG — LCEL, integrations, prompting and structured output, memory and retrieval, agents, observability, and security.
Module 16: RAG Basics — Chunking and Retrieval
RAG foundations — build a baseline system, choose embedding models and chunking strategies, and add hybrid retrieval with BM25, SPLADE, ColBERT.
Module 17: Advanced RAG — Rerankers and Adaptive Retrieval
Advanced RAG — query transformations, rerankers, self-correcting and adaptive retrieval, contextual retrieval, evaluation, and agentic RAG.
Module 18: Vector Quantization and Multimodal RAG
Scale vector search with quantization (scalar, binary, product) and retrieve over visually rich documents with the ColPali multimodal RAG paradigm — no OCR required.
Module 19: Graph RAG, Caching and RAG Security
Production RAG hardening — Graph RAG retrieval, vectorless patterns, semantic caching, PII masking, guardrails, and prompt-injection defence.
Agents and Multi-Agent Systems
Function calling, MCP, LangGraph, A2A protocol, observability, and Bedrock AgentCore deployment.
Module 20: Agent Basics — Function Calling and MCP
Agent foundations — structured outputs with Pydantic, cross-provider function calling, tool executor loops, and standardized tools with MCP.
Module 21: LangGraph for Multi-Agent Workflows
LangGraph — graph-based stateful agent workflows, the ReAct pattern, human-in-the-loop checkpoints, memory, and multi-agent orchestration.
Module 22: Agent Observability and A2A Protocol
Make agents production-grade — observability with LangSmith and Logfire, plus the A2A (Agent-to-Agent) protocol for cross-framework interoperability.
Module 23: Deploying Agents with Bedrock AgentCore
Deploy secure, isolated agents with Amazon Bedrock AgentCore — Runtime, Memory, Gateway, Identity, Browser, Observability, and Cedar guardrails.
Prompting, Context and Evaluation
Prompt engineering, context engineering, and evaluation harnesses with agent CI/CD.
Module 24: Prompt Engineering
Prompt engineering — prompt anatomy, few-shot and chain-of-thought, system prompt design, structured output, robustness testing, self-refinement.
Module 25: Context Engineering
Context engineering — context window anatomy, RAG as context construction, memory architectures, and compression with LLMLingua and RECOMP.
Module 26: Evaluation Harnesses and Agent CI/CD
Evaluation and LLMOps CI/CD — eval harnesses, benchmarking, agent-native evaluation with Inspect AI, LLM-as-judge, and eval-gated agent CI/CD.
Capstone Projects
End-to-end projects integrating fine-tuning, distillation, RAG, multi-agent systems, speech, and LLMOps.
Project 01: ClinicLLM — Medical LLM Fine-Tuning Pipeline
Build a domain-specific medical language model with QLoRA SFT and DPO preference alignment, then serve it with multiple hot-swappable LoRA adapters.
Project 02: TinyReason — Distilling Reasoning to a CPU Model
Compress a larger reasoning teacher into a small student using KL divergence and attention transfer, then quantize to GGUF for cost-efficient CPU inference.
Project 03: LegalRAG — Multi-Modal + Graph RAG
Build a legal-document intelligence system — ColPali multimodal indexing (no OCR), Neo4j knowledge graph, hybrid retrieval with RRF, RAGAS evals.
Project 04: DevOpsCrew — Multi-Agent DevOps with HITL and A2A
Build a production DevOps assistant with a LangGraph supervisor delegating to specialised sub-agents over MCP and A2A, with human-in-the-loop gating every write.
Project 05: EvalShip — Eval-Gated CI/CD with Auto-Rollback
Wrap all prior projects in a production LLMOps shell — every change passes eval-gated CI stages, with blue/green deploys and auto-rollback.
Project 06: VoiceTrack — Whisper STT Pipeline
Fine-tune Whisper on domain-specific audio and ship a production STT service with streaming transcription, diarisation hooks, and an evaluation gate on WER.