From "it works" to "it's shipped right" - Shipwright Repos Public

Vibe coding lowered the floor, but it skips the process that gets software safely to production. Google's new whitepaper makes the case for agentic engineering as the discipline that fills the gap, and Shipwright is my open-source implementation of exactly that - the process, encoded, running on Claude Code.

Date

June 24, 2026

One and a half years ago, a software architect that I value very much recommended I try Replit. I did, and I was hooked. Building software was finally simple, and it taught me a lot about how a commercial product bridges the gap from chat to deployed software.

But what was missing was everything that turns "it works" into "it's shipped right" - the parts that decide whether the person responsible for delivery can sleep at night.

The process is the part I know

That gap - between something that runs and something shipped right - is the part I have spent my career on. What I care about in the end is simple: that the software does what it was actually meant to do, that it is secure, that it is documented well enough to hand to someone else, and that it can be maintained and extended later without fear.

Getting there is work: turning vague intent into testable requirements, making trade-offs explicit, insisting on tests and reviews, and leaving a trail the next person can follow. In regulated environments - a FINMA-approved digital asset exchange and private banking - that work is not optional, and being able to prove it to an auditor is just the strict version of it. But the need is universal. Every serious project wants the same thing: that what was built matches the intent, and that it survives its next change.

That is exactly what the current wave of AI tooling makes easy to forget.

What vibe coding gets right, and where it stops

Andrej Karpathy gave the fast version a name in 2025: vibe coding. You describe what you want, accept what the model returns, and when something breaks you paste the error back and ask for a fix. The floor dropped. People who could never build software before can now build something that runs.

And the tools are good. Replit, the one I started on, will take a sentence and turn it into a running app: it scaffolds the project, provisions a database, generates the APIs, runs its own tests, writes documentation, and deploys to a live URL - keeping everything in one repo so the context is not lost. That is genuinely impressive.

Two things it does not give you. First, it is a closed world: Replit's stack, Replit's architecture, built for web apps - you get what they let you have, and no more. Second, and deeper, it skips the spine that keeps software trustworthy over time: the requirements that say what you are actually building, and the traceability that proves you built that and nothing else. Without that spine you have something that runs, not something you can hand over, secure, and still change safely a year from now.

And when nothing surrounds the model, the stakes get real. In one well-documented case from 2025, an AI coding agent deleted a production database during a code freeze. The instructive part is the fix: separating development from production and adding a staging environment - putting real engineering around the model. The apparatus, in other words. Security research points the same way - a large share of AI-generated code ships with vulnerabilities, because the model optimises for "it works," not "it is safe."

None of this is a knock on vibe coding for prototypes, experiments, or personal projects. There it is exactly the right speed. The trouble starts when something built that way quietly turns into something people depend on.

Karpathy named the next step, and Google described it well

In early 2026 Karpathy followed his own term with a second one: agentic engineering - the disciplined end of the same spectrum, where the difference is not whether you use AI but how much structure, verification and judgment surround what it produces.

Google's recent whitepaper, "The New SDLC With Vibe Coding" (May 2026), is the clearest description I have seen of how to industrialise that discipline: the harness around the model, the "factory model" where your real output is the system that produces the software, the tests and guardrails and feedback loops. What it spends less time on is the why underneath - that someone always has to trust this software and live with it. That part I had learned the long way.

One line stuck with me: "Generation is solved. Verification, judgment, and direction are the new craft." Reading the paper felt like reading the design notes for what I had spent the last six months building.

This is why I built Shipwright

Shipwright is the harness around whatever the AI generates: the discipline layer that turns vibe coding into engineering you can trust.

It runs on Claude Code and works from an intent, creates a real spec, so what gets built is what you actually asked for - and you can trace it back. Features are properly documented, tested, security and documentation come with the code instead of after it, and it carries you through clear phases, from specifying what you want to a deployed result. Because the record of how it was built stays with it, the result is something you can maintain, not just demo.

It is deliberately not a coding tool. Not Cursor, not Copilot. And it is deliberately not a vibe-coding enabler. Not Replit, not Lovable. It is the layer that turns "it generated something" into "it shipped something we can stand behind."

The discipline lives in the tool, not only in your head - but it does not run itself. It is a process you pick up and grow into. That is the real shift: you can produce professional work sooner than your own experience alone would allow, because the guardrails carry the parts that usually only years on the job provide - and you get sharper at the judgment that is left.

And it holds whether you are starting fresh, inheriting an existing codebase, or making the hundredth change after launch. New project, existing repo, every change after - the same process runs each time. Every change arrives clean: its own branch, its own tests, security checked on every pull request.

Why open, and why now

The Google paper is, honestly, the best argument for Shipwright I could have asked for. When the case for disciplined agentic engineering is being made at that level, there is no better moment to put the implementation in the open.

So that is what I am doing. The Shipwright SDLC skills are open source and free to use on GitHub. Build on them, fork them, and tell me where they break. A Masterclass that walks through the workflow is coming soon.

→ Shipwright on GitHub - open source, free to use

→ Shipwright WebUI on GitHub

→ Explore Shipwright

‍

The honest limit

I will not oversell this. Shipwright does not make AI build faster, and it does not take away your judgment. Whether the software does the right thing for real users - the hardest 20% - is still your call.

If you read the Google paper closely, it makes evaluations, not just tests, the hardest line of verification: judging the non-deterministic parts of what an agent does. Shipwright is strong on tests and review, but it does not yet have a formal eval harness. I would rather name that gap than pretend a tool removes it.

What it does is make what the AI builds something you would put your name on, and move your attention to where it belongs: away from chasing the process, toward the decisions only a human should make.

Generation is solved. The process was always the point.

Ship right, not just fast.

‍

Sources

The New SDLC With Vibe Coding - Addy Osmani, Shubham Saboo, Sokratis Kartakis / Google - May 2026 - kaggle.com/whitepaper-the-new-SDLC-with-vibe-coding
Vibe Coding is Passe (on Karpathy's shift to agentic engineering) - The New Stack - 2026 - thenewstack.io/vibe-coding-is-passe
Agentic Engineering - Addy Osmani - 2026 - addyosmani.com/blog/agentic-engineering
Incident 1152: an AI coding agent executed destructive commands during a code freeze, deleting production data - AI Incident Database - 2025 - incidentdatabase.ai/cite/1152
Passing the Security Vibe Check: The Dangers of Vibe Coding - Databricks - 2026 - databricks.com/blog/passing-security-vibe-check-dangers-vibe-coding