Thoughts Beyond the Harness

The harness debate has given the agentic-coding world a clean technical vocabulary, but it stops at the edge of the team. This is my attempt to contribute to the discussion. Using a harness is the first step. Changing the operating model around it is the actual work.

Datum

April 30, 2026

The harness debate has given the agentic-coding world a clean technical vocabulary, but it stops at the edge of the team. This is my attempt to contribute to the discussion. Using a harness is the first step. Changing the operating model around it is the actual work.

"Harness engineering" is everywhere right now. Red Hat calls it the third era. OpenAI's Ryan Lopopolo gives talks on it. Anthropic ships Managed Agents that bundle the harness as a service. The framing is useful. But I have the feeling we're only solving one part of the problem.

The harness is the mechanism

A quick definition first, because the term is moving fast. The community has roughly converged on what a harness is: tool loop, sandbox, memory, evals, guardrails - everything around the model except the model itself. Mitchell Hashimoto coined the phrase on 5 February 2026, with Ryan Lopopolo at OpenAI arriving at much the same picture six days later. Birgitta Böckeler's piece on martinfowler.com then gave us the cleanest taxonomy of guides and sensors. Red Hat, Addy Osmani, Cobus Greyling and others have largely lined up behind that picture since.

Why this matters now: Anthropic's April 23 postmortem proved that "the model is fine, the harness is broken" can silently degrade output for days before anyone notices. Three configuration regressions stacked - a change to default reasoning effort on 4 March, a thinking-clearing bug on 26 March, a verbosity-reducing system prompt on 16 April - and none of them touched the model weights. The harness alone was enough to make Claude Code feel meaningfully worse.

The point I keep coming back to: a harness tells the agent how to act. It says nothing about how a team organises around it.

Three new harness moves, one missing layer

Three things landed in April that, taken together, define the current state of the art.

Anthropic Managed Agents went into public beta on 8 April 2026: a managed harness loop, a sandbox container, persistent memory exposed as a REST API, billed at $0.08 per session-hour with tokens billed separately. It's a real product and it removes a lot of plumbing.

Red Hat published "Harness Engineering: Structured Workflows for AI-Assisted Development" on 7 April 2026 and named the shift cleanly: prompt engineering, then context engineering, then harness engineering. A useful conceptual bracket, even if the "third era" framing is a little tidier than reality.

Ryan Lopopolo at OpenAI gave the operational slogan: "humans steer, agents execute". His talk is a clean statement of how the Codex world wants you to think about it.

All three are genuine progress on the technical harness. None of them describes how a team operates around the agent. That's the missing layer.

Boris Cherny said the quiet part

Boris Cherny, the creator of Claude Code, has been hinting at this operating-model layer for months. His point in interviews and in his own workflow is consistent: agents aren't a side tool, they're part of the collaboration loop. His team maintains a shared CLAUDE.md in the repo, agents participate in PR reviews, and "anytime we see Claude do something incorrectly we add it to CLAUDE.md so Claude knows not to do it next time". The agent is in the social workflow, not next to it.

Addy Osmani's matching warning, from his March essay on "comprehension debt", lands in the same place from the opposite side. AI accelerates code output, but the human review and knowledge-transfer loop breaks down. Code exists that nobody on the team understands, and unlike technical debt, nobody made a conscious decision to take it on.

Simon Willison's 24 April follow-up to the Anthropic postmortem made the same point in plainer language: the harness was silently broken for days because not enough people were reading the output closely enough to notice.

The pattern is consistent. The harness handles execution. The team handles trust. And the trust layer is where the work hasn't been done yet.

Three roles that need to exist

Regardless of tooling - Claude Code, Codex, Cursor, rolled-your-own - three roles seem to keep recurring whenever agentic work scales beyond a single developer.

Solution Owner defines what good looks like. Not a spec writer. A person who can explain in writing why this problem is being solved, what counts as solved, and what the constraints are. Without this role, the agent does the wrong thing very efficiently.

Agent Operator runs the loop. Picks the model, picks the harness, decides when to stop, decides when output is ready for review. Closest to today's "developer with a Claude Code session", but explicitly named as a role rather than an unspoken default.

Review Gate is the most underspecified one. In practice it isn't a single gate, it's three:

  • Second pair of eyes via AI - and the emphasis is on "second". A different person from the Agent Operator runs a verification agent against the original output, ideally with a different prompt, a different setup, or a different model family. Cheap, fast, catches obvious drift, and crucially breaks the loop where the same human both writes and validates.
  • Architect with self-built review skills - a senior engineer who has built review skills tailored to this codebase and this spec, not a generic linter. This is where domain judgement lives.
  • SME validation - the person who actually understands the business outcome. Often not a developer at all.

The order matters. Each gate filters what reaches the next, and each protects against a different kind of failure: drift, design erosion, business mismatch.

A note on my own bias

My own framework, Shipwright, is a harness. I should be honest about that. It has hooks, sandboxes, tool loops, evals, the lot. What I tried to do differently is build it with a process in mind, not just a toolbox: phase separation between spec, plan, build and review, and an append-only event log so decisions don't evaporate at the end of a session. The "second pair of eyes" pattern from above is wired in - Shipwright runs Gemini and OpenAI review loops on plans and on code reviews, so the validating model isn't the one that wrote the work. On top of that, I regularly run an adversarial pass over my own output using the Codex plugin inside Claude Code - same principle, a different model family with a different bias.

But that's still just a harness. A harness is important - it forms the foundation that a target operating model gets built around. It's also just the beginning. The roles, the handoffs, the trust between people - that part isn't in the tool. It's done by the team that uses it.

Closing

The harness debate is healthy. We needed it. But buying a harness is just the beginning. The real work is the change to the operating model and to the people who run it: who owns the spec, who runs the agent, who reviews, who validates. The harness becomes important precisely because it operationalises that new operating model, not because it replaces it.

 


Sources:

  • "An update on recent Claude Code quality reports" - Simon Willison - 24 April 2026 - https://simonwillison.net/2026/Apr/24/recent-claude-code-quality-reports/
  • "Anthropic April 23 Postmortem" - Anthropic Engineering - 23 April 2026 - https://www.anthropic.com/engineering/april-23-postmortem
  • "Claude Managed Agents (Public Beta)" - Anthropic - 8 April 2026 - https://claude.com/blog/claude-managed-agents
  • "Harness Engineering: Structured Workflows for AI-Assisted Development" - Red Hat Developers - 7 April 2026 - https://developers.redhat.com/articles/2026/04/07/harness-engineering-structured-workflows-ai-assisted-development
  • "Harness Engineering: How to Build Software When Humans Steer, Agents Execute" - Ryan Lopopolo (OpenAI) - April 2026 - https://youtube.com/watch?v=am_oeAoUhew
  • "Comprehension Debt - the hidden cost of AI generated code" - Addy Osmani - 14 March 2026 - https://addyosmani.com/blog/comprehension-debt/
  • "Building Claude Code with Boris Cherny" - Pragmatic Engineer - 2026 - https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny
  • "My AI Adoption Journey" (introducing harness engineering) - Mitchell Hashimoto - 5 February 2026 - https://mitchellh.com/writing/my-ai-adoption-journey
  • "Harness engineering for coding agent users" - Birgitta Böckeler - martinfowler.com - https://martinfowler.com/articles/harness-engineering.html