Project 02: TinyReason - Distilling Reasoning to a CPU Model

Project Overview

TinyReason takes a larger teacher model with strong reasoning behavior and distils its knowledge into a much smaller student that can run on CPU at production cost.

Objective

Train a small student model using a combined distillation loss against a larger teacher on a math-reasoning corpus, then ship it as a quantized CPU-served endpoint.

Scope

Teacher soft-label generation across the training set.
Student architecture choice and a custom combined distillation loss.
Ablation study across loss configurations vs. the teacher baseline.
Conversion to GGUF and CPU inference benchmarking.

Datasets

A mathematical-reasoning problem set (e.g., GSM8K-style) for primary distillation and evaluation.
An extended-reasoning set for stress testing.

Stack

transformers + PyTorch for logit extraction from the teacher.
A custom training loop with loss-curve tracking.
KL divergence and attention transfer implemented from scratch.
llama.cpp with GGUF conversion for CPU serving.
llama-server for an OpenAI-compatible CPU inference API.

Evaluation

Quality gap vs. the teacher on held-out problems.
Tokens-per-second and first-token-latency benchmarks.

Deliverables

Saved soft labels for all training problems.
Trained student checkpoint with the combined loss.
Ablation comparison table across loss configurations.
A quantized GGUF model file.
A CPU-served endpoint with a benchmark report.

Prerequisites

Modules 09-10 (reasoning models, SLMs and distillation), Module 07 (quantization and serving).

Project 02: TinyReason - Distilling Reasoning to a CPU Model

Project Overview

Objective

Scope

Datasets

Stack

Evaluation

Deliverables

Prerequisites

Found this useful? Keep building with me.

Latest recommendations you might like

Project 06: VoiceTrack: Whisper STT Pipeline

Project 05: EvalShip: Eval-Gated CI/CD with Auto-Rollback

Project 04: DevOpsCrew: Multi-Agent DevOps with HITL and A2A

Project 03: LegalRAG - Multi-Modal and Graph RAG

Find this tutorial useful?

Discussion & Comments