From Vibe Coding to Agentic Engineering

You don't need to be a software engineer to feel the shift from vibe coding to agentic engineering. AI tools have made generating code nearly free - but shipping bad code is as costly as ever, whether you built it on Replit or in a professional IDE.

Date

April 2, 2026

You don't need to be a software engineer to have experienced this. Maybe you built an app on Replit or Lovable over a weekend — described what you wanted, watched it come together, shipped it. Everything looked great. Then a customer's data leaked, or the app broke the moment ten people used it at the same time, or you needed to change something and realised you had no idea how any of it actually worked.

That moment — the gap between "it works in the demo" and "it works in reality" — is where the conversation around AI-assisted development is shifting right now. And it matters whether you're a seasoned engineer or a founder who just used AI to build their first product.

Where "Vibe Coding" Actually Came From

Andrej Karpathy introduced the term "vibe coding" in early 2025, and he meant it almost affectionately — the idea of describing what you want to an AI and going with the flow, barely looking at the output. For prototyping, side projects, and getting an idea off the ground, that's genuinely liberating. Tools like Replit, Lovable, Cursor, and Claude Code have made it possible for anyone — not just engineers — to go from idea to running app in hours. The friction that used to separate "I have an idea" from "I have a running product" has collapsed dramatically.

But Karpathy himself recognised the limits. By early 2026, he was proposing a new term: agentic engineering. The distinction isn't just terminological. It marks a shift in professional responsibility.

Vibe coding fails at a predictable point: when the code leaves your laptop. When it touches production traffic, payment flows, user data, compliance requirements, or a codebase that five other engineers also need to understand next quarter. The speed that makes it attractive in a proof-of-concept becomes a liability the moment you deploy it.

The evidence is getting harder to ignore. A December 2025 analysis by CodeRabbit of 470 open-source GitHub pull requests found that AI co-authored code contained approximately 1.7 times more major issues compared to human-written code. Separate research found that up to 45% of AI-generated code contains security vulnerabilities. A study of apps built on Lovable found that 10.3% had critical row-level security flaws in their database configurations — not obscure edge cases, but holes that would expose user data directly. Over 40% of junior developers admit to deploying AI-generated code they don't fully understand.

These aren't arguments against AI-assisted development. They're arguments for doing it with more discipline than "vibe coding" implies.

What Agentic Engineering Actually Means

Simon Willison — whose thinking on this I find consistently useful — published his Agentic Engineering Patterns guide in February 2026. His framing cuts through a lot of noise: agentic engineering is what happens when professional software engineers use coding agents to amplify their existing expertise, not replace it.

The key word is amplify. The agent can both generate and execute code. It can run tests, iterate, and operate over many steps without hand-holding. That's genuinely new. But what it amplifies is whatever foundation the engineer brings. If that foundation is clear thinking about architecture, test coverage, security boundaries, and maintainability — the agent makes you faster at building something solid. If the foundation is "let's see what it generates" — the agent makes you faster at accumulating problems you'll pay for later.

One of Willison's first patterns is blunt: writing code is cheap now. The implication isn't that code quality doesn't matter. It's that the bottleneck has shifted. When generating initial code costs almost nothing, the expensive part becomes ensuring it's correct, secure, and maintainable. That's where human expertise concentrates. The economics of software have changed, but the engineering responsibilities haven't.

His second published pattern is test-first development. Not because it's a new idea — it isn't — but because it turns out that agents produce more succinct and reliable code when they're working against a clear specification of what "correct" looks like. Write the test first, let the agent fill in the implementation, then verify. The structure that TDD always promised but often struggled to deliver at human coding speed becomes more natural when an agent handles the implementation.

The Cognitive Debt Problem

There's a specific failure mode that I think gets underestimated: what some researchers are now calling cognitive debt. It's not quite the same as technical debt, though they compound.

Technical debt is the accumulated cost of shortcuts in the code itself. Cognitive debt is the cost of accumulated gaps in your understanding of what the code does. When you've accepted a hundred AI-generated suggestions without reading them carefully, you have a codebase that runs — until it doesn't — and you have no clear mental model of why it makes the decisions it makes. Debugging becomes archaeology.

In 2026, with 84% of developers using AI tools that now write around 41% of all code, the scale of this problem is no longer theoretical. Only about 29–46% of developers say they actually trust AI-generated output. That gap — between adoption and trust — is where cognitive debt accumulates. We use it anyway, but we know we shouldn't fully trust it, and we don't always do the work to verify it either.

The engineering discipline that agentic engineering demands is, in part, the discipline of staying inside your own understanding. If you can't explain what a section of code does and why, that's not the agent's problem — it's yours. The agent won't get paged at 2am.

Spec-Driven Development as the Structural Answer

One of the more promising structural responses to this problem is spec-driven development. The core idea is to make specifications — not code — the primary artefact. You write a clear, detailed description of what the system should do, what its constraints are, what its security requirements are, what "done" looks like. The agent then generates code against that specification. The specification becomes the thing you review, refine, and maintain.

This inverts a habit that most developers have carried since they started. We write code and sometimes write documentation afterward. Spec-driven approaches treat the specification as the source of truth and code as a derived output. When something breaks, you ask: did the code deviate from the spec, or was the spec wrong? Both are answerable questions. "Why did the AI do this?" often isn't.

It also solves a structural problem with agentic loops: context. AI coding agents lose context across sessions, across long conversations, across large codebases. A well-maintained specification gives the agent something to work from that doesn't require it to infer your intent from 40 files of history. The discipline of writing clearly for an agent is, incidentally, also the discipline of thinking clearly about what you're building.

This is exactly what Shipwright addresses. The framework I've been building is built around spec-driven, agentic SDLC — treating specifications as first-class artefacts that drive planning, implementation, testing, and compliance. The goal is to bring engineering structure to a development process that AI tools are making faster but also, without care, more fragile.

→ Explore Shipwright

The Oversight Question

One of the more uncomfortable aspects of agentic engineering is that it requires accepting something uncomfortable about the current state of these tools: they are powerful, increasingly autonomous, and not reliable enough to run unsupervised on code that matters.

The failure mode has a name now: "AI slop." Code that looks reasonable on the surface, passes a quick scan, but lacks proper error handling, introduces subtle security vulnerabilities, breaks existing functionality in non-obvious ways, or creates architectural decisions that make future changes significantly harder. The slop is plausible — that's what makes it dangerous. A junior developer might catch it. A tired senior developer on a fast-moving project might not.

The practices that guard against it aren't glamorous. Review the diff before you commit — every time. Run existing tests before generating new code; if you don't have tests, that's the first problem to solve. Never merge AI-generated changes without reading them. Treat the agent as a skilled but junior collaborator who needs review, not as an oracle whose output you ship.

Addy Osmani, whose thinking on LLM workflows I follow, makes a similar point: treat the LLM as a powerful pair programmer that requires clear direction, context, and oversight, not autonomous judgment. The oversight is not optional. It's the job.

What Actually Changes (and What Doesn't)

The honest version of what's happening is this: the cost of generating code has collapsed, but the cost of shipping bad code hasn't. If anything, the gap between those two costs has widened, because you can now generate more code, faster, than any team can responsibly review under pressure.

The engineering practices that matter most — clear specifications, test coverage, security review, thoughtful architecture, version control discipline, code review — were always important. They're more important now, not because AI is making them harder to do, but because AI is making it easier to skip them and still feel productive, right up until you aren't.

The developers who will thrive in this environment aren't the ones who prompt the best. They're the ones who bring genuine engineering discipline to a toolset that amplifies whatever you put in. Expertise in what good software actually looks like — secure, maintainable, tested, understood — turns out to be more valuable, not less, when the implementation is partly automated.

Vibe coding was a useful framing for a moment when everyone was figuring out what these tools could do. Agentic engineering is the more honest description of what it takes to use them well. The shift isn't really about terminology. It's about whether you're willing to do the engineering work even when the tool is offering you a shortcut.

Some shortcuts are worth taking. The ones that leave you unable to explain your own codebase aren't.

Sources: