I Pointed Fable 5 at My Own Codebase. It Found 42 Bugs I Didn't Know About - Then Showed Me the Bill.

I ran Anthropic's new Fable 5 model with a dynamic workflow against my own 200'000-line codebase. It found 42 verified bugs I didn't know about - and the run cost 250.- in extra usage in three hours. An honest look at what an agentic codebase audit actually delivers, and what it costs.

Date

June 12, 2026

What a thousand commits don't catch

Shipwright is the project I've put the most care into. It's an open-source framework that runs the full software development lifecycle on Claude Code, and by this month it had grown to roughly 200'000 lines of code, around 130'000 of them Python, the rest Markdown and a few others. Every section gets reviewed. Many bugs were identified and fixed.

And still, I have no illusion that it's clean. So I finally did something I'd been curious about for a while. I pointed the most capable setup I have access to at my own work and asked it one question: what did I miss?

Three hours later I had a list of 42 verified bugs I genuinely did not know were there. I also had a bill that made me put my coffee down. Both halves of that sentence are the reason I'm writing this. The instrument is expensive - but set it next to what a senior engineer would charge to read 200'000 lines this carefully, and "expensive" starts to look like the wrong word.

What I actually ran

Two pieces of technology did the work here. They're worth understanding before the numbers make sense, because the story is really about what happens when you stack them on top of each other.

Dynamic workflows

The first piece is dynamic workflows in Claude Code, which Anthropic shipped at the end of May 2026 alongside Claude Opus 4.8, initially as a research preview. The idea is simple to state and surprisingly deep in practice. Instead of one assistant working through a task in a single conversation, Claude writes a small JavaScript orchestration script, and a runtime executes that script in the background. The script fans the work out across many parallel subagents - up to 1'000 of them - and only the final, checked result comes back to your session.

Two design choices matter for what follows. First, the coordination happens outside the conversation. Intermediate results live in script variables, not in the model's context window, which is how a workflow can touch a whole repository without drowning in its own notes. Second, and this is the part I cared about, the workflow runs independent verification on every finding. In Anthropic's own words, for a codebase sweep "Claude searches a service or repo in parallel, then runs independent verification on every finding so the report surfaces real issues." There are adversarial agents whose entire job is to try to break a finding before you ever see it.

A codebase-wide bug hunt is one of the canonical use cases the documentation calls out by name. So this wasn't me bending the tool into a shape it wasn't built for. It was me using it exactly as intended.

There's also a line in that same documentation that I read at the time and underlined only in hindsight: dynamic workflows "can consume substantially more tokens than a typical Claude Code session." Hold that thought.

Fable 5

The second piece is the model. I ran the workflow on Fable 5, which Anthropic released on 9 June 2026. It's their most capable publicly available model to date, the first Mythos-class model they've made generally available - a tier that sits above the Opus models in capability. The headline isn't a single benchmark. It's endurance. Fable 5 is built to stay focused across millions of tokens in long-running tasks and to work autonomously for longer than any previous Claude. For a job where hundreds of agents each need to read code, reason about it, and hold a thread for hours, that endurance is exactly the property you want.

It is also priced like a frontier model: 10 US dollars per million input tokens and 50 dollars per million output tokens, which is twice the rate of Opus 4.8. That number is going to come back too.

The result: 56 found, 42 real

The run took about three hours. The workflow went through the full 200'000 lines, fanned out across more than 200 agent runs, and came back with 56 candidate bugs. Then the verification layer did its work - paired agents trying to break each finding - and 14 of those fell away: false positives, things that looked wrong in isolation but were fine in context, duplicates. What remained were 42 issues that survived an adversarial second look.

The remaining 42 are real. Not style nits. They cluster in exactly the places a system like this tends to drift quietly. Hook contracts that don't fully hold to the behaviour they promise. Path resolution that breaks at the edges, where a file isn't quite where the code assumed it would be. Concurrency and atomicity gaps, where two processes touch the same file at the same moment and the result depends on luck. Producer-consumer drift, where one component writes data in a shape another component no longer expects. Test-reality gaps, where a test passes while asserting something that quietly stopped being true.

And the category that bothered me most: silent failures - what I think of as "degraded green," where something underneath has actually failed but the pipeline still reports success. Those are the dangerous ones, because nothing ever asks for your attention.

So on the capability question, I have no hedge. It worked. It found things I would have shipped.

The bill: 250.- in three hours

Now the other half.

I'm on the Claude Max plan, the 200.- tier. Like all the paid plans it runs on a rolling five-hour session window, with weekly caps on top. Inside that window you get a generous allowance of usage, and for normal work I almost never reach it.

I reached it in under thirty minutes.

That's not a complaint, it's arithmetic. Stack the two things I just described and the result is predictable in retrospect. Fable 5 costs twice what Opus does per token. A dynamic workflow fans out across hundreds of agents, each one reading code and writing verification, and the documentation tells you plainly it consumes substantially more tokens than a normal session. Point all of that at 200'000 lines and the included session allowance evaporates.

By the time the rate limit stopped the first run, it had fanned out about 107 agents and burned roughly 5 million tokens. The plan doesn't stop you there - it lets you keep going on extra usage, billed at standard API pricing as pay-as-you-go credits, charged separately from the subscription. So I let it resume, and it finished. The two runs together came to roughly 11.8 million subagent tokens, around 230 agent runs and some 3'300 tool calls - and a chunk of that was pure waste, because the verifier pairs the abort had killed mid-flight had to be paid for and run again. A cleaner single pass would have landed closer to 9 to 12 million. The session reset on its usual five-hour clock, and the additional usage on top of my 200.- subscription came to 250.- for those three hours.

250.-. For one audit. I want to be precise about what that buys, because the instinct is to read that as either "outrageous" or "trivial" depending on your day, and it's neither. It bought a verified defect list on a 200'000-line codebase, produced in a few hours, that would have cost a security-and-quality review firm considerably more and taken considerably longer. It also bought it in a way that is easy to trigger by accident and easy to repeat without noticing, which is the part worth being careful about.

So was it worth it?

For this specific job, yes, clearly. A pre-release audit on the codebase I'm asking other people to trust and build on is exactly the moment to spend money on certainty. I'd do it again before a major release without hesitating.

But "worth it for this" is not "worth it as a habit," and that's the trade-off I'd ask anyone to sit with before they fire up the same setup. A few honest distinctions I've landed on:

This is a periodic instrument, not a daily driver. Running Fable 5 with full-fan-out workflows on every change would be like booking a structural engineer to hang a picture. The cost only makes sense against a decision that's expensive to get wrong.
Match the model to the task. Most of my day-to-day still runs perfectly well on Opus 4.8 at half the token price. The frontier model earns its premium on long, autonomous, high-stakes runs, not on routine edits.
Set a spending cap before you start, not after. The pay-as-you-go credits can be capped, and on a workflow that fans out unpredictably, a ceiling is cheap insurance against a surprise.
Budget the audit as a line item. 250.- is a sensible number for a quarterly deep audit of a serious codebase. It's an absurd number to stumble into because you didn't realise what "fan out across hundreds of agents" actually meant in tokens.

The broader point is that agentic engineering at this scale has real unit economics, and we're all still building the intuition for them. For two years the mental model was "the subscription is a flat fee and usage is basically free." Dynamic workflows quietly end that model. When you can spin up a thousand agents, the question stops being "can I" and becomes "is this particular answer worth what it will cost to get it." That's a healthier question, honestly. It just takes getting used to.

Where this leaves me

The capability to find 42 hidden bugs in a morning is extraordinary.

What stays with me isn't really the bug count, though. It's what a setup like this puts within reach of one person. The current generation of these capabilities lets somebody like me ship genuinely good, well-checked work without the headcount and the cost that used to be the price of entry.

Shipwright - the codebase I pointed all of this at - is open source, if you want to see what I'm building.

→ Explore Shipwright

Watching more than 200 agents fan out across my code felt like a glimpse of the future. Except the future was yesterday.

Sources:

Introducing dynamic workflows in Claude Code - Anthropic / Claude - 2026 - https://claude.com/blog/introducing-dynamic-workflows-in-claude-code
Orchestrate subagents at scale with dynamic workflows - Claude Code Docs - 2026 - https://code.claude.com/docs/en/workflows
Anthropic Ships Opus 4.8 with Dynamic Workflows - WinBuzzer - 29 May 2026 - https://winbuzzer.com/2026/05/29/anthropic-ships-opus-48-with-dynamic-workflows-xcxwbn/
Claude Fable 5 and Claude Mythos 5 - Anthropic - 9 June 2026 - https://www.anthropic.com/news/claude-fable-5-mythos-5
What is the Max plan? - Claude Help Center - 2026 - https://support.claude.com/en/articles/11049741-what-is-the-max-plan
Manage extra usage for paid Claude plans - Claude Help Center - 2026 - https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans