Project 06: VoiceTrack: Whisper STT Pipeline

Project Overview

VoiceTrack takes the speech-to-text track from concept to production. The project covers a domain-specific audio dataset, fine-tuning Whisper, and serving a streaming STT API with an evaluation gate on WER.

Objective

Fine-tune Whisper on a domain-specific audio corpus and deploy a streaming STT service with an evaluation harness that gates new checkpoints on WER and domain quality.

Scope

Audio data collection, transcription cleanup, and alignment.
PEFT-style fine-tuning of Whisper on the domain set.
A streaming STT pipeline with chunked decoding.
A WER-based evaluation gate plus a domain-quality rubric.

Datasets

A domain-specific audio set with cleaned transcriptions.
A held-out evaluation slice with golden transcripts.

Stack

Hugging Face transformers and datasets for fine-tuning.
PEFT for parameter-efficient Whisper adaptation.
An audio-processing pipeline for normalisation and chunking.
An OpenAI-compatible STT API for serving.

Evaluation

Word Error Rate (WER) on the held-out set.
A rubric for domain-specific terminology and formatting.
Comparison against the base Whisper checkpoint.

Deliverables

A cleaned, aligned domain audio dataset.
A fine-tuned Whisper checkpoint with a measurable WER improvement.
A streaming STT endpoint with an OpenAI-compatible API.
An evaluation report comparing the fine-tuned model to the base.

Prerequisites

Module 13 (Speech-to-Text with Whisper), Modules 03-05 (fine-tuning fundamentals and datasets).

Project 06: VoiceTrack: Whisper STT Pipeline

Project Overview

Objective

Scope

Datasets

Stack

Evaluation

Deliverables

Prerequisites

Found this useful? Keep building with me.

Latest recommendations you might like

Project 05: EvalShip: Eval-Gated CI/CD with Auto-Rollback

Project 04: DevOpsCrew: Multi-Agent DevOps with HITL and A2A

Project 03: LegalRAG - Multi-Modal and Graph RAG

Project 02: TinyReason - Distilling Reasoning to a CPU Model

Find this tutorial useful?

Discussion & Comments