I Gave Claude Fable 5 a Real Consulting Project. Within Hours, I Stopped Checking Its Work.

A hands-on session with Claude Fable 5 on a live financial consulting engagement: PDF analysis, legacy code migration, reconciliation debugging, and autonomous report generation.

Jun 10, 20269 min readFollow

Topics You Will Master

How Claude Fable 5's vision system extracts data from scanned charts and nested PDF tables
Why autonomous agent runs change the economics of consulting and analysis work
How Fable 5 approaches legacy code migration with self-generated tests and human escalation
What "multi-document reasoning" means in practice, and where the model still has edges

On June 9, 2026, Anthropic released Claude Fable 5 (the first "Mythos-class" model the public can actually touch). The launch posts were full of the usual superlatives: state-of-the-art on nearly every benchmark, capable of working autonomously for extended periods, a vision system that can read charts buried inside PDFs.

I have been in AI long enough to be allergic to launch-day adjectives. I spent close to a decade building RAG pipelines and reconciliation systems for financial firms before going full-time into teaching, and I have watched every "revolutionary" model release since GPT-2. My default setting is skepticism.

But I had something most reviewers don't: a live, messy, deadline-shaped problem to throw at it.

For the past few weeks, my small team has been running a consulting engagement for a mid-sized wealth management firm in Mumbai (I'll call them "Meridian Capital" here). The brief: their research analysts were drowning in fund documents, their portfolio reconciliation was held together by decade-old Python scripts, and the partners wanted a quarterly research report that didn't take three analysts two weeks to assemble.

It was, in other words, the perfect torture test. Here is what happened.

Use Case 1: The 1,400-Page Document Problem

The first task I gave Fable 5 was the one I expected it to fail.

Meridian's due diligence packet for a single fund recommendation runs around 1,400 pages: scheme information documents, SEBI circulars, audited financials, and (the worst part) scanned annexures where the important numbers live inside tables and charts, not text. Every model I'd tested before could read the text of these PDFs. None could reliably read the figures.

I loaded the packet and asked one question: "Find any inconsistency between the stated expense ratios and what the fee structure charts actually show."

Forty minutes later, it flagged something my analyst had missed in two days of manual review. In one fund's documents, the expense ratio quoted in the summary text was 1.85%, but the tiered fee chart in a scanned annexure (an actual image of a chart) implied an effective ratio closer to 2.1% for the client's investment bracket. Fable 5 had extracted the numbers from the chart itself, recomputed the blended fee, and cited the exact page of the discrepancy.

This is the vision capability Anthropic talked about at launch: the model genuinely reads diagrams and tables nested inside documents, not just the surrounding text. I had assumed that was demo-ware. Watching it catch a real discrepancy with real money attached changed my posture for the rest of the session.

What it felt like: honestly, a little unsettling. That single catch justified the engagement fee. My analyst's first reaction was silence; her second was "can it check the other eleven funds?"

It could. It did. Overnight.

Use Case 2: The Migration Nobody Wanted to Do

Meridian's reconciliation system was the kind of codebase every consultant recognizes: ~40,000 lines of Python 2-era scripts, no tests, written by a developer who left in 2019, processing crores worth of trades daily. The firm knew it was a liability. Nobody would touch it.

Anthropic claimed Fable 5 had migrated a 50-million-line Ruby codebase in a day (work estimated at two months for a full team). Our problem was a thousand times smaller, so I decided to test the claim at our scale.

I ran Fable 5 inside Claude Code with a simple instruction set: modernize the codebase, preserve every output bit-for-bit, and prove it. Then I did something that still feels strange to write: I let it run while I recorded a YouTube video in the other room.

It planned in stages. It wrote a characterization test suite first (capturing the legacy system's exact outputs on six months of historical trade data) before changing a single line. Then it migrated module by module, running its own tests after each change, and when a rounding difference appeared in the FX conversion module (Python 2's banker's rounding versus Python 3's), it didn't paper over it. It stopped, documented the discrepancy, and asked me which behavior was correct, because the legacy behavior was arguably a bug.

That pause was the moment Fable 5 earned my trust. Junior engineers hide discrepancies. Good seniors surface them. The model behaved like the latter.

Total wall-clock time: about 9 hours of mostly unattended work. Our original estimate for a human team: six to eight weeks.

Use Case 3: The Reconciliation Break That Made No Sense

Midway through the engagement, Meridian handed us a live fire: a recurring reconciliation break of ₹4.3 lakh that appeared every Tuesday, had survived three internal investigations, and was being manually adjusted away each week like a bad habit.

I gave Fable 5 the trade files, the custodian statements, and the settlement calendar, and asked for a root-cause analysis, not a patch.

It reasoned the way a senior analyst does: forming hypotheses, testing them against the data, discarding the ones that didn't survive. The eventual answer was beautiful in an ugly way: a timezone mismatch. Trades executed on US markets late Monday (US time) were being stamped Tuesday by the Indian system, but the custodian's file used the US trade date, and the break only crossed the firm's materiality threshold on Tuesdays because that's when US settlement volume peaked.

Three humans had looked at this. Each had checked the trades, the prices, the quantities. None had questioned the dates, because the dates looked fine from inside either system. You needed to hold both calendars in your head simultaneously, which is exactly the kind of senior-level, multi-document reasoning Fable 5 was benchmarked on (it currently tops Hebbia's finance reasoning benchmark, and after this session I believe it).

Use Case 4: The Report That Wrote Itself While Mumbai Slept

The final deliverable was the one the partners actually cared about: a quarterly research report synthesizing all twelve fund analyses, the reconciliation findings, and market context into something a relationship manager could hand a client.

This is normally two weeks of analyst time, not because the thinking is hard, but because the assembly is brutal: pulling numbers from forty sources, keeping every figure consistent, formatting charts, writing prose that doesn't sound like a database query.

I gave Fable 5 the full project context at 11 PM and went to sleep.

This is where the "asynchronous" part of Anthropic's pitch became real for me. The model broke the report into sections, delegated research passes to sub-agents, built the comparison tables, drafted the prose, and then (this is the part I keep telling people) audited its own draft, cross-checking every number in the text against the source documents and fixing two figures it had transposed.

By morning there was a 38-page draft waiting. Not a perfect one: the executive summary was too hedged, and I rewrote the recommendations section in my own voice, because a model's caution and a consultant's conviction are different instruments. But the draft was at the level of a strong senior analyst's second pass, not a junior's first.

One honest note on cost: Fable 5 is not cheap. At $10 per million input tokens and $50 per million output, our first session of heavy usage ran into real money. Against two analyst-weeks of salary, it wasn't even a conversation. Against casual experimentation budgets, it might be; this is a model you point at problems that matter.

Pricing Comparison: Where Fable 5 Sits in the Market

The $10 / $50 figure for Fable 5 is the Fast Mode rate for Claude Opus 4.8. Here is how it compares against the current flagship models from all three major providers.

Anthropic

Model Input (per 1M tokens) Output (per 1M tokens)
Claude Haiku 4.5 $1.00 $5.00
Claude Sonnet 4.6 $3.00 $15.00
Claude Opus 4.8 (standard) $5.00 $25.00
Claude Opus 4.8 — Fast Mode (Fable 5) $10.00 $50.00

Prompt caching cuts input costs by up to 90%. Batch processing cuts all rates by 50%.

OpenAI

Model Input (per 1M tokens) Output (per 1M tokens)
GPT-4.1 nano $0.10 $0.40
GPT-5.4 mini $0.75 $4.50
o3 (reasoning) $2.00
GPT-5.4 $2.50 $15.00
GPT-5.5 $5.00 $30.00
GPT-5.5 Pro $30.00

Cached inputs drop 90% on GPT-5.5 and GPT-5.4.

Google Gemini

Model Input (per 1M tokens) Output (per 1M tokens)
Flash-Lite $0.10 $0.40
Gemini 2.5 Pro $1.25 $10.00
Gemini 3.5 Flash $1.50 $9.00
Gemini 3.1 Pro $2.00 $12.00

All Gemini models run at 50% off list price in batch mode (up to 24-hour turnaround). Input pricing doubles above 200K tokens on Pro models.

The Cost Reality

At standard rates, GPT-5.5 ($5 / $30) and Claude Opus 4.8 standard ($5 / $25) are in the same bracket for input, with Claude slightly cheaper on output. Gemini 3.1 Pro ($2 / $12) undercuts both on output tokens by a meaningful margin. The Fast Mode premium for Fable 5 is real (2x the standard Opus rate), but it delivers the throughput needed to turn an overnight autonomous run into a morning deliverable.

What I'm Left With

Where it stumbled, briefly: an early attempt at the report used an outdated SEBI categorization that we caught in review, and on one reconciliation query it initially over-explained instead of answering. It is a tool with edges, not magic.

But here is the thought I cannot shake. For ten years, my value as a consultant was being the person who could hold the whole problem in his head: the documents, the code, the data, the deadline. Fable 5 is the first model I've used that can hold the whole problem too. Not assist with pieces of it. Hold all of it, autonomously, and check its own work along the way.

That doesn't make consultants obsolete. The timezone bug needed someone to know which question to ask. The report needed a human voice and human accountability. But the floor of what one person plus this model can deliver has moved, visibly, in one engagement, in one session.

I'm now rebuilding parts of my own course pipeline around it, and I'll be publishing the full technical walkthrough (prompts, agent setup, and the reconciliation analysis) on the KGP Talkie channel soon. If you work anywhere near documents, data, or legacy code, my advice is simple: stop reading reviews, including this one. Take your ugliest real problem and hand it over. The launch-day adjectives, for once, undersold it.


Client details in this post have been anonymized and lightly fictionalized to protect confidentiality. Model capabilities, benchmarks, and pricing are as published by Anthropic at the June 9, 2026 launch.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments