#Generative AI#LLM#Benchmarking#Prompt Engineering

Advanced LLM Benchmark Prompt Suite

A comprehensive guide on progressive benchmarking for Large Language Models using structured prompt architectures and high-quality browser-based tests.

May 16, 2026 at 11:15 AM31 min readFollowFollow (Hindi)

Topics You Will Master

Progressive benchmarking strategy for evaluating LLMs on escalating complexity
Structuring self-contained, single-shot benchmark prompts with clear success criteria
Testing models across 11 challenges from UI builds to game engines and agentic coding
Applying the scoring rubric to objectively compare model outputs
Best For

AI engineers and prompt engineers who need a rigorous, structured suite to benchmark and compare LLM coding and reasoning capabilities.

Expected Outcome

A reusable benchmark prompt library that objectively reveals where any LLM excels or breaks under complex, multi-constraint, single-shot tasks.

How to use this suite: Each prompt is self-contained. Paste the full prompt block (including the Technical Stack and Constraints sections) into the model. Do NOT split across messages — send as one shot and evaluate the output against the Success Criteria listed under each prompt.


Progressive Benchmarking Strategy

Strategy Overview

Most LLMs fail when handed a 30-feature spec in one shot.

Use this order:

  1. Start with a visually impressive but scoped task
  2. Verify core interactions work before adding complexity
  3. Use the "Difficulty Extension" only after the base output passes
  4. The Ultimate Stress Test (last prompt) is intentionally rule-breaking — it is designed to expose where the model collapses

Recommended prompt anatomy used throughout this suite:

  • One primary goal
  • 5–10 concrete features (not vibes)
  • One visual/style anchor
  • One hard technical constraint that must work
  • One fallback instruction
  • One difficulty extension

KGP Vibes Cafe — Modern Dark Theme Website

Context

A menu PDF is attached to this prompt. Parse it and build a complete, production-grade single-page website for KGP Vibes Cafe.

Prompt

PLAINTEXT
You are a senior frontend engineer and UI/UX designer.

A menu PDF is attached. Parse all menu items, categories, prices, and descriptions from it.

Build a complete, single-file HTML website for **KGP Vibes Cafe** using HTML5, CSS3, and vanilla JavaScript. No frameworks. No external dependencies except Google Fonts and Font Awesome via CDN.

**Primary Goal:** A stunning, modern dark-theme cafe website that feels premium, loads instantly, and works on mobile and desktop.

**Design System (strictly follow):**
- Background: #0A0A0A (near-black)
- Surface: #141414
- Card: #1C1C1C
- Accent: #C9A84C (warm gold)
- Text Primary: #F5F5F5
- Text Secondary: #9E9E9E
- Font: 'Playfair Display' for headings, 'Inter' for body (Google Fonts)
- Border radius: 12px for cards, 4px for buttons
- Micro-animations: 200ms ease-in-out transitions on hover

**Required Sections:**
1. **Hero** — Full-viewport dark hero with cafe name in large serif, a tagline, and a "View Menu" CTA button (gold, smooth scroll)
2. **Menu** — Grid layout of menu categories parsed from the PDF. Each item card shows: name, description, price (gold color), and a vegetarian/non-veg badge. Include a sticky category filter bar at top
3. **About** — Two-column layout: left is a placeholder image area (dark gradient box with cafe logo text), right is a short about paragraph
4. **Gallery** — 6-cell masonry-style grid with placeholder dark gradient boxes (numbered 1–6), ready for real images
5. **Contact** — Address, phone, hours of operation, and an embedded Google Maps placeholder (iframe with a dummy URL)
6. **Footer** — Social links (Instagram, Facebook, YouTube icons), copyright

**Hard Technical Constraints:**
- The menu filter buttons must actually filter visible cards in real time without page reload
- Mobile navigation must collapse into a hamburger menu below 768px
- Smooth scroll must work for all anchor links
- All CSS must be in a `<style>` block inside the HTML file — no external CSS files

**Fallback Instruction:**
If any menu section cannot be parsed from the PDF (e.g., images are not extractable), use placeholder text "Item description coming soon" and preserve the card layout.

**Difficulty Extension:**
Add a "Today's Special" animated banner at the top of the menu section that rotates through 3 featured items every 4 seconds using a CSS keyframe fade transition.

**Deliver:** One complete `.html` file. No placeholder comments saying "add CSS here" — every section must be fully implemented.

Success Criteria

  • Menu filter buttons work (click "Beverages" → only beverage cards visible)
  • Hamburger menu opens/closes on mobile
  • Gold accent color consistent throughout
  • No broken layout at 375px, 768px, and 1440px widths

4-Cylinder Engine — Interactive Simulation

Prompt

PLAINTEXT
You are a mechanical engineering visualization expert and a browser animation engineer.

Build a single-file HTML5 + CSS3 + vanilla JavaScript interactive simulation that teaches how a 4-cylinder internal combustion engine works.

**Primary Goal:** A real-time, animated, technically accurate visualization of the 4-stroke cycle (Intake → Compression → Power → Exhaust) across all 4 cylinders, with user controls.

**Visual Direction:**
- Dark engineering theme: #0D1117 background, #21262D panels, #58A6FF for active strokes, #F85149 for combustion flash
- Cylinders rendered as clean SVG cross-sections showing: piston, connecting rod, crankshaft, valves (intake = green, exhaust = red), spark plug
- Labels must appear on hover for each component
- Firing order: 1-3-4-2 (standard inline-4)

**Required Features:**
1. **4 cylinder SVG animation** — Each piston moves up/down sinusoidally driven by a shared crankshaft. Crankshaft offset: 0°, 180°, 360° (540°), 270° for firing order 1-3-4-2
2. **Stroke phase indicator** — Below each cylinder, a color-coded badge shows the current stroke name in real time
3. **Crankshaft rotation display** — A circular dial shows crankshaft angle (0–720°) updating every animation frame
4. **RPM control** — A slider (500–6000 RPM) controls animation speed. Display current RPM numerically
5. **Combustion flash** — When the power stroke fires, the cylinder interior briefly flashes red/orange for 100ms
6. **Valve animation** — Intake valve opens during intake stroke, exhaust valve opens during exhaust stroke (shown as small colored rectangles moving)
7. **Play / Pause / Reset** — Three buttons. Reset returns to 0° crankshaft position
8. **Stroke detail panel** — Clicking any cylinder freezes that cylinder's view and shows a sidebar explaining what is happening at that exact moment

**Hard Technical Constraint:**
The crankshaft angle must be the single source of truth. All piston positions, valve states, and stroke indicators must derive mathematically from `crankAngle % 720`. No separate timers per cylinder.

**Fallback Instruction:**
If SVG rendering becomes too complex, simplify cylinders to rectangles with moving piston rectangles — but all timing logic and controls must still function.

**Difficulty Extension:**
Add a torque curve graph panel (using Canvas) that updates in real time, showing which cylinder is contributing power at each crankshaft angle.

Success Criteria

  • All 4 pistons animate with correct phase offset
  • Stroke label changes correctly (4 labels × 4 cylinders cycling through 4 strokes)
  • RPM slider noticeably changes animation speed
  • Combustion flash fires exactly once per cylinder per 720° revolution

Browser FPS Game — First Person Shooter Prototype

Prompt

PLAINTEXT
You are a senior browser game developer specializing in Three.js.

Build a single-file HTML5 browser-based first-person shooter using Three.js (import via CDN: https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js).

**Primary Goal:** A playable FPS with a working gun, visible targets, ammo tracking, hit detection with visual feedback, and sound effects.

**Visual Direction:**
- Dark warehouse / shooting range environment
- Floor: dark concrete texture (procedural noise pattern, no image files)
- Targets: bright red circular bullseye SVG textures on flat planes, 5 targets placed at varying distances (5m to 20m)
- Gun model: visible in bottom-right of screen as a simple 3D box geometry with dark metallic material (a stylized gun shape using merged BoxGeometry)
- Muzzle flash: a white/yellow PointLight that appears for 80ms on fire

**Required Features:**
1. **Pointer lock controls** — Click to lock mouse for look; WASD for movement; Space to jump (one jump, no double jump)
2. **Shooting** — Left mouse click fires. Gun recoil animation (camera kicks up 0.05 radians then eases back over 200ms)
3. **Ammo system** — 30 rounds per magazine. HUD shows "24 / 30" style counter. R key reloads (1.5 second reload animation with HUD text "RELOADING...")
4. **Hit detection** — Raycaster fires from camera center. If it hits a target mesh, that target shows a bullet hole (small dark circle decal added to target surface) and flashes white for 150ms
5. **Target score** — Each target has 3 hit zones: bullseye (50pts), inner ring (25pts), outer ring (10pts). Score shown in HUD top-left
6. **Sound effects** — Use Web Audio API (no external audio files): gunshot = short noise burst (white noise + fast decay), reload click = lower frequency click, target hit = higher pitched ping
7. **Kill / reset** — When all 5 targets are hit in the bullseye, show a "RANGE CLEARED" overlay with score and a "Play Again" button that resets targets and ammo
8. **Crosshair** — CSS-based crosshair in center of screen. Expands slightly on fire (CSS transition)

**Hard Technical Constraint:**
Hit detection must use THREE.Raycaster fired from camera position in camera direction. Proximity-based collision is not acceptable. Bullet holes must persist on the target mesh surface (do not clear them on miss).

**Fallback Instruction:**
If Three.js gun geometry is too complex, replace with a visible rectangular block in the bottom-right viewport. All gameplay systems must still function.

**Difficulty Extension:**
Add moving targets — 2 of the 5 targets oscillate left/right on a sine wave path. Hitting a moving target awards 1.5× score multiplier.

Success Criteria

  • Pointer lock activates on click; mouse look works
  • Ammo counter decrements on fire, shows RELOADING on R key
  • Raycaster correctly identifies which target zone was hit
  • Bullet holes persist visibly on target surface
  • Sound plays on fire and hit (Web Audio, no files needed)

Endless Runner — Subway Surfer Style Physics Game

Prompt

PLAINTEXT
You are a 2D browser game developer.

Build a single-file HTML5 Canvas endless runner game inspired by Subway Surfers. No external libraries — vanilla JavaScript + Canvas API only.

**Primary Goal:** A smooth, addictive endless runner where the player dodges obstacles, collects coins, and the speed increases over time. The game must feel responsive and satisfying.

**Visual Direction:**
- Isometric-style 2.5D perspective OR a clean side-scrolling view (choose whichever renders better)
- 3 lanes: left, center, right
- Color palette: vibrant — sky blue background, green lane surfaces, gold coins, red obstacles, bright character sprite
- Character: a simple geometric character (circle head + rectangle body) with running animation (2-frame leg swap at 8fps)
- Coins: gold circles with a star outline
- Obstacles: red/orange solid boxes (barriers) and overground obstacles (pipes) that require jumping

**Required Features:**
1. **Lane switching** — Left/Right arrow keys or A/D keys smoothly animate the character between lanes (150ms tween)
2. **Jump** — Up arrow or W. Standard parabolic arc. Double jump allowed (one mid-air jump only)
3. **Slide** — Down arrow or S. Character shrinks to half height for 600ms to dodge overhead obstacles
4. **Coin collection** — Coins spawn in patterns (straight line, diagonal, arc). Collision detection triggers a +1 coin counter with a floating "+1" text animation
5. **Obstacle collision** — On hit, show a red flash overlay, play a crash sound (Web Audio), and transition to Game Over screen
6. **Speed progression** — Start at scroll speed 5px/frame. Increase by 0.001 every frame (effectively doubles speed at ~5000 frames)
7. **Score** — Distance traveled (in meters, 100px = 1m). Displayed top-center
8. **Coin counter** — Top-right. Coins carry over to a high-score screen
9. **Power-ups (2 minimum):**
   - **Magnet** — Coins within 150px auto-collect for 5 seconds (show magnet icon on HUD)
   - **Shield** — Absorbs one collision (show blue aura around character)
10. **Game Over screen** — Shows score, coins collected, personal best, and a "Try Again" button
11. **Sound** — Web Audio API: coin collect = short high ping, crash = low thud, jump = quick whoosh

**Hard Technical Constraint:**
Lane positions must be defined as exact X coordinates (e.g., leftLane=200, centerLane=400, rightLane=600). Character X position must tween toward the target lane coordinate. Collision detection must use AABB (Axis-Aligned Bounding Box) — not distance-based.

**Fallback Instruction:**
If 2.5D isometric rendering is too complex, use clean 2D side-scroll. Do not sacrifice gameplay responsiveness for visual complexity.

**Difficulty Extension:**
Add a combo multiplier: collecting 5 consecutive coins without missing increments a multiplier (×1 → ×2 → ×3 → max ×5). Breaking the chain resets it. Show multiplier in HUD.

Success Criteria

  • Lane switching is smooth (no teleporting)
  • Jump + slide both function and interact correctly with obstacle types
  • Coin collection triggers visible floating text
  • Speed visibly increases over time
  • Game Over screen shows correct score and personal best

WebOS — Browser Operating System

Prompt

PLAINTEXT
You are a senior frontend systems engineer.

Build a complete browser-based desktop operating system simulation in a single HTML file using HTML5, CSS3, and vanilla JavaScript. No frameworks. No external dependencies except Font Awesome via CDN for icons.

**Primary Goal:** A functional, visually polished fake OS that runs entirely in the browser with real working apps, a desktop, taskbar, and window management.

**Visual Direction:**
- Dark OS theme: #1A1A2E desktop background (deep navy), #16213E taskbar, #0F3460 window titlebars
- Font: system-ui
- Desktop wallpaper: a CSS gradient that shifts between navy and dark purple
- Window drop shadow: 0 8px 32px rgba(0,0,0,0.5)
- Active window: slightly lighter titlebar vs. inactive windows

**Required Desktop Features:**
1. **Taskbar** (bottom) — Clock (live, updates every second), Start button (opens app launcher grid), minimize/restore buttons for open windows, system tray with volume icon placeholder
2. **Desktop icons** — Double-click to open apps. Right-click desktop → context menu with "Refresh", "New File" (creates an untitled icon)
3. **Window management** — Draggable by titlebar, resizable from all edges, minimize to taskbar, maximize (fills viewport), close. Z-index management (clicked window comes to front)
4. **App launcher** — Grid of app icons with labels. Search bar filters apps in real time

**Required Built-in Apps (all must be functional):**

**Calculator**
- Standard calculator with keyboard support
- Operations: +, -, ×, ÷, %, √, +/-, clear, backspace
- Display shows current input and running expression

**Terminal**
- Accepts commands: `help`, `ls`, `pwd`, `echo [text]`, `date`, `clear`, `whoami` (returns "kgpuser"), `calc [expression]` (evaluates math), `cowsay [text]` (renders ASCII cow)
- Scrollable output area, command history with Up/Down arrow keys
- Blinking cursor

**Text Editor (Notepad)**
- Basic textarea with a dark theme
- File menu: New, Save (downloads as .txt), Open (file picker)
- Character/word count in status bar

**Snake Game**
- Arrow key controls, growing snake, apple spawning, score counter
- Game Over overlay with restart button

**GTA5 City Clone**
- Top-down 2D city driving game using Canvas
- Simple grid-based city map (roads, blocks, intersections)
- WASD driving controls: car model (rectangle with direction indicator), traffic rules ignored
- Collision detection against building blocks
- Minimap in corner showing player position
- Wanted level system: drive into "police zone" → stars appear, police car chases you (simple pathfinding)

**Flight Simulator**
- Top-down or side-view 2D flight using Canvas
- Aircraft rendered as a simple SVG-style plane shape
- WASD or arrow keys: W = throttle up, S = brake, A/D = bank left/right
- Altitude indicator, speed indicator, fuel gauge (depletes slowly)
- Clouds as moving background elements
- "Landing pad" — bring aircraft over it at low speed to land and score

**Hard Technical Constraints:**
- Windows must maintain correct Z-index stacking — clicking any window must bring it to the front
- Resize handles must work from all 4 corners and all 4 edges (8 resize zones)
- The terminal `calc` command must safely evaluate expressions (use Function constructor carefully — no `eval` with user strings directly)

**Fallback Instruction:**
If GTA or Flight Simulator are too complex to complete fully, implement them as functional but simplified versions (GTA = just driving on a grid; Flight = just 2D movement). Do not skip them entirely — a placeholder that launches but shows "coming soon" will fail this benchmark.

**Difficulty Extension:**
Add a File System simulation: Terminal `ls` and `pwd` reflect a virtual file tree. Text Editor `Save` writes to this virtual FS. A `Files` app (like Windows Explorer) shows the virtual file tree and allows double-clicking .txt files to open them in the Text Editor.

Success Criteria

  • All 6 apps open in independent draggable windows
  • Calculator handles keyboard input correctly
  • Terminal responds to all listed commands
  • Snake game is fully playable
  • GTA clone: car moves, collision with buildings works
  • Flight simulator: plane moves and instruments update

Rubik's Cube — 3D Interactive Puzzle with Sound

Prompt

PLAINTEXT
You are a 3D browser graphics engineer.

Build a fully interactive 3D Rubik's Cube in a single HTML file using Three.js (CDN: https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js).

**Primary Goal:** A beautiful, smooth, fully functional 3D Rubik's Cube that the user can rotate face-by-face with keyboard or on-screen buttons, with satisfying sound effects and a solve detector.

**Visual Direction:**
- Black background (#000000)
- Cube pieces: dark grey (#222) base, colored face stickers with slight gloss (MeshStandardMaterial with roughness: 0.3, metalness: 0.1)
- Standard Rubik's colors: White=#FFFFFF, Yellow=#FFD700, Red=#CC0000, Orange=#FF6600, Blue=#0047AB, Green=#009B48
- Ambient light: soft white. Directional light from top-right
- Small gap (0.05 units) between cubie faces for visual depth
- Smooth face rotation: 300ms ease-in-out animation (no instant snapping)

**Required Features:**
1. **3×3×3 cube** — 27 cubies correctly assembled. Each face has 9 stickers in the correct color
2. **Orbit controls** — Click and drag to rotate the whole cube view (implement OrbitControls or equivalent without importing from examples — use mousedown/mousemove/mouseup on the renderer)
3. **Face moves (keyboard):**
   - U / u = Up face clockwise / counter-clockwise
   - D / d = Down face clockwise / counter-clockwise
   - R / r = Right face clockwise / counter-clockwise
   - L / l = Left face clockwise / counter-clockwise
   - F / f = Front face clockwise / counter-clockwise
   - B / b = Back face clockwise / counter-clockwise
4. **On-screen move buttons** — Panel showing all 12 moves (U, U', D, D', R, R', L, L', F, F', B, B') as clickable buttons
5. **Scramble button** — Applies 20 random valid moves with animation (executes them sequentially with 150ms delay between moves)
6. **Move counter** — Displays total moves made since last scramble
7. **Solve detector** — After every move, check if all 9 stickers on each face share the same color. If solved, show a "SOLVED! 🎉" overlay with move count and a confetti particle effect (Canvas 2D overlay)
8. **Sound effects (Web Audio API):**
   - Face move: a short mechanical "click" (short sine wave burst at 440Hz, 80ms duration, fast decay)
   - Scramble: rapid sequence of clicks
   - Solve: ascending 4-note chime (C4→E4→G4→C5, 150ms each)
9. **Undo button** — Reverses the last move (stores move history as a stack)
10. **Reset button** — Returns cube to solved state with a 500ms dissolve/reassemble animation

**Hard Technical Constraint:**
Face rotation must correctly update the cubie positions in a data structure — not just rotate visuals. After every move, the internal state (which cubie is at which position with which orientation) must be updated. The solve detector must read from this state, not from the visual colors.

**Fallback Instruction:**
If cubie orientation tracking becomes too complex, track color state as a 6×3×3 array (6 faces, 3 rows, 3 cols) and derive solved state from it. Visuals can be rebuilt from this array on each move.

**Difficulty Extension:**
Add a **Solve Hint** button — highlight the next best move using a simplified beginner's method (at minimum: detect and solve the white cross as step 1, highlight the relevant face button).

Success Criteria

  • All 6 face moves work correctly and update cube state
  • Orbit drag rotates the cube view without triggering face moves
  • Scramble applies 20 visible animated moves
  • Solve detector correctly identifies a solved cube (test by applying U then U' — should detect as solved)
  • Sounds play on move and solve

Music Composer Synthesizer

Prompt

PLAINTEXT
You are an audio engineer and browser UI developer.

Build a complete browser-based music composition synthesizer in a single HTML file using the Web Audio API and vanilla JavaScript. No external audio libraries.

**Primary Goal:** A fully functional synthesizer that lets users compose music by playing notes on a visual keyboard, selecting synth types, recording sequences, and playing them back — all with professional audio quality.

**Visual Direction:**
- Dark studio aesthetic: #111111 background, #1E1E1E panels, neon green (#39FF14) accent for active elements
- Piano keyboard: standard black-and-white key layout, 3 octaves (C3 to B5, 36 keys)
- Active key highlight: bright glow effect using CSS box-shadow
- Panel layout: keyboard at bottom, controls above it, oscilloscope/visualizer at top

**Required Features:**

**Synthesizer Engine:**
1. **4 oscillator types:** Sine, Square, Sawtooth, Triangle (toggle buttons, one active at a time)
2. **ADSR envelope controls:** Attack (0–2s), Decay (0–2s), Sustain (0–1 level), Release (0–3s) — range sliders with live value display
3. **Master volume:** slider (0–100%)
4. **Reverb toggle:** convolution reverb using Web Audio ConvolverNode with a synthetic impulse response (generate it procedurally — no audio files)
5. **Low-pass filter:** frequency cutoff slider (200Hz–20000Hz) + resonance (Q) slider (0.1–20)

**Keyboard:**
6. **Mouse click + drag** — clicking a key plays the note; dragging across keys plays them in succession (like glissando)
7. **Computer keyboard mapping:**
   - A S D F G H J = C D E F G A B (white keys, octave 4)
   - W E = C# D# (black keys)
   - T Y U = F# G# A# (black keys)
   - Z X = shift octave down / up
8. **Sustain pedal:** Space bar holds notes until released

**Sequencer / Recorder:**
9. **Record button** — starts recording: every note played is captured with its timestamp and duration
10. **Stop + Play** — plays back the recorded sequence in a loop
11. **BPM control** — slider (60–200 BPM) that adjusts playback tempo (time-stretches the recording proportionally)
12. **Clear** — wipes the recording
13. **Step sequencer grid** (16 steps × 8 notes) — a visual grid where cells can be toggled on/off. When playback is running, a cursor scans left-to-right. Each column fires the active notes in that step

**Visualizer:**
14. **Oscilloscope** — Canvas-based real-time waveform display using AnalyserNode (waveform mode). Updates at 60fps. Neon green line on black background
15. **Spectrum analyzer** — toggle between oscilloscope and frequency bar graph (FFT size 256)

**Hard Technical Constraint:**
All notes must use a shared AudioContext. Each note must create a new OscillatorNode + GainNode, apply the ADSR envelope via AudioParam automation (setValueAtTime, linearRampToValueAtTime), and connect through the filter → reverb → master gain → destination. No note should leave "ghost" audio nodes running after release.

**Fallback Instruction:**
If the step sequencer grid becomes too complex, implement record/playback only and leave a placeholder grid UI. All synth engine features and keyboard must be fully functional.

**Difficulty Extension:**
Add a **Chord mode** toggle: when active, pressing a key plays the full major chord (root + major 3rd + perfect 5th) simultaneously, each voice through its own oscillator.

Success Criteria

  • Playing keys produces correct pitch (A4 = 440Hz)
  • ADSR sliders noticeably change the note envelope
  • Reverb audibly changes the sound quality
  • Record → Stop → Play produces correct note sequence
  • Oscilloscope animates correctly while notes play

Financial RAG Application (Agentic Coding Task)

Context

This benchmark evaluates agentic coding models on setting up and running a local PDF document parsing pipeline.

Prompt

PLAINTEXT
You are a senior Python engineer building a production-grade RAG application for financial document analysis.

Before writing any code, fetch and read the full documentation at https://laxmimerit.github.io/RAGWire/ — including the Getting Started, Providers, and Cookbook sections. Understand the library's API, configuration format, and how to set it up with a local Ollama provider. Then tell me exactly what I need to install and run locally before we start (Docker commands, Ollama model pull commands, pip installs, etc.) — list everything with the exact commands. Wait for my confirmation before generating any code.

Once confirmed, build a complete **Streamlit** application using **RAGWire** as the RAG backend, **Ollama** for local LLM and embeddings (use `qwen3:latest` as the LLM and `nomic-embed-text` for embeddings), and **Qdrant** as the vector store running locally via Docker.

**Primary Goal:** A full-featured financial document RAG app where users can upload PDFs (SEC filings, 10-K, 10-Q, annual reports), ingest them into a vector store, and chat with the documents using a local LLM — zero API cost, fully private.

**File structure to create:**
```
financial_rag_app/
├── app.py                  # Main Streamlit application
├── config.yaml             # RAGWire configuration (defaults + UI override)
├── ragwire_manager.py      # RAGWire wrapper: ingest, retrieve, generate
├── requirements.txt
└── README.md
```

**Connection settings (with defaults, user-overridable in UI):**
- Qdrant: `localhost:6333`, collection: `financial_docs`
- Ollama: `localhost:11434`, LLM: `qwen3:latest`, embeddings: `nomic-embed-text`
- A "Save Settings" button must write a new config and reinitialize the RAGWire instance without restarting the app

**Required app features:**

Sidebar:
1. Collapsible connection settings panel (all fields above, editable)
2. PDF uploader (multiple files) — on upload, save to temp dir, ingest via RAGWire, show chunk count + duplicates skipped
3. Collection stats: total vectors, model names
4. "Clear Collection" button with a confirmation checkbox guard

Main chat area:
5. Persistent chat history using `st.session_state`
6. Streaming response via `st.write_stream` — do not buffer before displaying
7. Source citations panel (expandable) below each response: file name, page number, 200-char excerpt
8. Metadata filters in sidebar: company name, document type (10-K / 10-Q / Annual Report / Other), fiscal year — passed to retrieval when set

**Hard Technical Constraints:**
- RAGWire must be initialized once and cached in `st.session_state`
- Uploaded files must be saved to disk before passing to RAGWire (UploadedFile is not a path)
- Collection must not be recreated on every run — clearing is always an explicit user action
- If hybrid search fails, auto-fall back to similarity search and show a warning banner

**README.md must include:**
- All prerequisites with exact commands (mirroring what you told me at the start)
- How to run: `streamlit run app.py`
- 5 sample financial questions to test the app

**Difficulty Extension:**
Add a **Compare Documents** tab: user picks 2 ingested documents from a dropdown, asks a comparison question, and the app retrieves top-3 chunks from each separately and presents the LLM comparison in a two-column layout.

Success Criteria

  • Agent lists all prereqs with exact commands before writing code
  • App runs with streamlit run app.py after following those commands
  • Uploading a PDF triggers ingestion and shows chunk count
  • Chat produces RAG-grounded responses (not hallucinated)
  • Source citations expand and show correct file names
  • Changing settings and saving reinitializes RAGWire without app restart

Research Paper Explainer — Animated React App

Context

A research paper PDF is attached. Build a complete React application that reads this PDF, extracts its content, and presents a visually animated, interactive explainer.

Prompt

PLAINTEXT
You are a senior React developer and data visualization engineer.

A research paper PDF is attached. Build a complete React application (single JSX file with inline styles — no build toolchain required, use React via CDN with Babel standalone) that reads this PDF, extracts its content, and presents a visually animated, interactive explainer.

**Primary Goal:** Transform a dense academic paper into a beautiful, animated, interactive explainer that a smart non-specialist could understand. Think "3Blue1Brown meets an academic journal."

**Visual Direction:**
- Dark modern theme: #0A0A0A background, #1A1A1A card surfaces
- Accent: electric blue (#3B82F6) for highlights, emerald (#10B981) for definitions, amber (#F59E0B) for key numbers/stats
- Typography: large section headers, comfortable 18px body text
- Animations: Framer-style CSS transitions, elements fade-slide in as user scrolls
- Use Recharts (available via CDN) for all data charts

**CDN Imports (use these exactly):**
```html
<script src="https://cdnjs.cloudflare.com/ajax/libs/react/18.2.0/umd/react.production.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/react-dom/18.2.0/umd/react-dom.production.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/babel-standalone/7.23.2/babel.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/recharts@2.10.3/umd/Recharts.js"></script>
```

**Required Sections (rendered as scrollable full-page sections):**

1. **Hero Section**
   - Paper title (large, gradient text)
   - Authors + institution + year
   - A one-paragraph "plain English summary" generated from the abstract
   - Key contribution badges (3–5 highlighted pill badges)

2. **Problem Statement**
   - What gap does this paper address?
   - Animated "before vs after" comparison using two side-by-side panels that flip in on scroll
   - Prior work limitations listed as animated bullet points that appear one-by-one (staggered 200ms)

3. **Methodology Diagram**
   - Extract the core method/architecture from the paper
   - Render it as an animated flowchart built entirely from CSS divs and SVG arrows (no image dependency)
   - Each block in the flowchart should have a tooltip on hover with a one-sentence explanation
   - If the paper has a neural network / model architecture, render it as a layer diagram with animated data flow arrows

4. **Key Results**
   - Extract all numeric results (accuracy, F1, BLEU score, latency, etc.) from the paper
   - Render them as animated bar charts or line charts using Recharts
   - Comparison table: proposed method vs baselines, with the winning cells highlighted green
   - Animate chart bars growing from 0 to their value on scroll into view (use IntersectionObserver)

5. **Figures & Tables**
   - For each figure described in the paper (that doesn't rely on a bitmap image): reconstruct it as an SVG or Canvas visualization
   - For each table: render as a styled HTML table with alternating dark row colors and sortable columns (click header to sort)

6. **Key Insights (Takeaways)**
   - 5–7 key insights extracted from the paper
   - Each rendered as a card with an icon (use Unicode emoji), a headline, and a 2-sentence explanation
   - Cards animate in from the right on scroll (CSS transform translateX)

7. **Limitations & Future Work**
   - Extracted limitations as a list
   - Future directions as timeline steps (vertical timeline component)

8. **Glossary**
   - 10 domain-specific terms extracted from the paper
   - Clicking a term anywhere in the document highlights it and scrolls to its definition

**Hard Technical Constraints:**
- PDF parsing: use `fetch` to read the attached PDF as ArrayBuffer, then use pdf.js (CDN: `https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.11.174/pdf.min.js`) to extract text page by page. Display a loading progress bar while parsing
- All charts must use Recharts — no Canvas-based charts
- Animations must use CSS transitions/keyframes + IntersectionObserver — no GSAP or animation libraries
- The app must work without a backend — all extraction runs client-side

**Fallback Instruction:**
If pdf.js cannot extract structured content (e.g., scanned PDF), show a text input area where the user can paste the abstract + results section, and generate the explainer from that text instead.

**Difficulty Extension:**
Add a **Quiz Mode** button: generates 5 multiple-choice comprehension questions from the paper content, presents them one-by-one, tracks score, and shows explanations for wrong answers.

Success Criteria

  • PDF loads and text is extracted (progress bar shown)
  • All 7 sections render with correct paper content
  • Bar charts animate on scroll into view
  • Methodology flowchart is interactive (hover tooltips work)
  • Glossary terms are clickable and scroll to definitions

Classic Mario Game

Prompt

PLAINTEXT
You are a browser game developer specializing in 2D platformers.

Build a complete classic Super Mario Bros.-style platformer game in a single HTML file using HTML5 Canvas and vanilla JavaScript only. No libraries, no external assets — all graphics are drawn programmatically using Canvas 2D API.

**Primary Goal:** A fully playable Mario-style platformer with responsive controls, multiple tile types, enemies, coins, a flagpole finish, and sound effects — all in one HTML file.

**Visual Direction:**
- Classic NES-style pixel art rendered as colored rectangles and shapes (no image files)
- Sky blue background (#5C94FC)
- Ground tiles: brown (#8B4513) fill, darker border
- Pipes: green (#00A800) with dark border
- Mario: red cap (rectangle), blue overalls, skin-colored face — animated with 2-frame walk cycle
- Coins: gold circles with spinning animation (scale X between 1 and 0.2 at 8fps)
- Enemies (Goomba): dark brown rectangle with feet, walking left
- Question blocks: gold (#FFD700) with "?" text, animate when hit
- Flagpole: grey pole, green flag at top

**Required Features:**
1. **Level layout** — Tile-based level defined as a 2D array. At minimum: 3 platform gaps (player must jump across), 2 pipes, 5 question blocks (3 contain coins, 1 contains a mushroom, 1 is empty), 8 goombas spread across the level, 1 flagpole at the end
2. **Mario physics:**
   - Gravity: constant downward acceleration (~0.5px/frame²)
   - Run: left/right arrow or A/D keys, max speed 6px/frame, acceleration 0.8, deceleration 1.2 when key released
   - Jump: Up arrow, W, or Space — variable height jump (hold longer = jump higher, max 300ms)
   - Coyote time: 80ms grace window to jump after walking off a platform
3. **Camera** — Horizontal scroll follows Mario. Camera locks at left edge (Mario can't go left of screen) and stops at level end
4. **Coin collection** — Walking into a coin removes it and triggers: +1 counter, floating "+1" text (rises and fades), coin sound
5. **Enemy AI (Goomba):**
   - Walks left at constant speed
   - Reverses direction on hitting a wall or edge
   - Stomping from above kills it (flatten animation for 500ms, then remove)
   - Side collision with Mario → Mario dies
6. **Question blocks** — Jumping from below triggers the block: bounce animation (moves up 8px, bounces back), spawns coin or mushroom from above
7. **Mushroom** — Slides right, falls with gravity. Mario collecting it grows Mario (scale 1.5×) and grants one extra hit
8. **Mario death** — On enemy side-hit (or falling into a pit): death animation (jump up then fall off screen), respawn at start, lives counter decrements. 3 lives total. Game over screen after 0 lives
9. **Flagpole finish** — Reaching the pole end triggers a slide-down animation, then "Level Clear!" screen with score + time bonus
10. **HUD** — Coin count, lives remaining, score, timer (counting down from 400), world label (1-1)
11. **Sound effects (Web Audio API — no files):**
    - Jump: mid-frequency rising chirp
    - Coin: high-pitch "ding" (A5, 100ms)
    - Stomp: low thud
    - Death: descending chromatic run (4 notes, 150ms each)
    - Power-up: 8-note ascending arpeggio
    - Level clear: 6-note jingle

**Hard Technical Constraint:**
Physics must be tile-based: collision detection checks against the tile map array, not against other game objects' bounding boxes. Mario's AABB must resolve collision on each axis independently (X first, then Y) to prevent corner-catching bugs.

**Fallback Instruction:**
If variable-height jump is too complex, use a fixed jump impulse. If mushroom logic becomes unstable, replace with a star that grants temporary invincibility (simpler state). Do not remove any enemy or level structure.

**Difficulty Extension:**
Add a second level (World 1-2) with an underground theme (black background, grey bricks, no sky): triggered automatically after Level 1 clear. Reuse the tile engine with a new tile array and spawn map.

Success Criteria

  • Mario moves, jumps, and collides with platforms correctly
  • Coyote time works (player can jump briefly after walking off edge)
  • Goombas walk, reverse direction, and die on stomp
  • Question blocks animate and spawn items when hit from below
  • HUD updates correctly (coins, lives, score, timer)
  • Level clear triggers on reaching flagpole

Ultimate Stress Test

Context

The final stress test prompt evaluates a model's limits under high dimensional requirements.

Prompt

PLAINTEXT
Build a browser-based cyberpunk open-world simulation in a single HTML file combining:

- A BrowserOS desktop layer (draggable windows, taskbar, working apps)
- A first-person 3D city explorer (Three.js) with WASD movement inside the OS as a windowed app
- AI NPCs that navigate the city using A* pathfinding and respond to player proximity
- A dynamic economy (buy/sell items, prices fluctuate by supply/demand)
- Procedural city block generation using a seeded random algorithm
- An in-world browser (renders a fake website inside the 3D world as a texture)
- A physics vehicle system (car with torque, steering, suspension)
- Cinematic post-processing: bloom, chromatic aberration, vignette (via custom GLSL shaders)
- Multiplayer-ready architecture (WebSocket client connects to ws://localhost:8080, gracefully degrades if not available)
- Real-time weather system (rain particles, fog density changes, lightning flashes)
- A persistent save system (localStorage: player position, inventory, economy state)
- Full HUD: minimap, health, money, wanted level, weather indicator
- The experience must run at a stable 30+ FPS on a mid-range laptop (no WebGL2 extensions required)

The project should feel like a playable futuristic simulation, not a demo. Prioritize technical sophistication, visual excellence, performance optimization, large-scale architecture, emergent NPC behavior, and player immersion. The experience runs directly in the browser.

Evaluation Strategy

Score on how many systems are implemented vs. stubbed vs. broken. A model that implements 8/14 systems correctly scores higher than one that attempts all 14 but produces broken code.


Scoring Rubric (Apply to All Prompts)

Rubric Criteria

Criteria Weight What to Look For
Core feature completion 40% Do the hard technical constraints actually work?
Code quality 20% Readable, no redundant globals, correct use of APIs
Visual polish 15% Does it look intentional? Design system followed?
Edge cases handled 15% What happens on empty state, resize, rapid input?
Fallback behavior 10% If it can't do everything, did it degrade gracefully?

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments