The Developer as Team Lead of Agents: How the Operating Model needs to Change
With Callista I am working at a large Swiss insurance company on documenting a 20-year-old Data Warehouse. In a project meeting this week I showed live analysis chats from a 14-hour session. The realisation landed quickly: the operating model of a dev team has to change. The developer becomes a team lead running specialised agents across the full SDLC.

Datum
With Callista I am currently working on a project at a large Swiss insurance company. Callista currently specialises in legacy Data Warehouse migrations in the context of Agentic Engineering. The task sitting on the table is one everyone in enterprise IT knows: document a Data Warehouse that has been quietly running parts of the business for twenty years. Twenty years of SAS Scripts, undocumented ETL jobs, only a few people left that still understand parts of what is going on.
In the project meeting this week I presented how I actually work. Not slides about agentic AI. I shared my screen and walked through a real analysis session from a 14-hour day earlier: from the Data Mart down through every layer to the source, reading SAS scripts, cutting through the jungle, building small helpers where I needed them. All of it with GitHub Copilot and Sonnet 4.6, not even Opus. I took the room through the chats themselves, step by step, showing how I move through that jungle with the agent in front of me.
The room watched this for about an hour and a half. Then something happened I had not expected to come that quickly. One person put it plainly: "As a task worker I now need to develop skills that normally a leader has." What they saw was not a developer using a tool. They saw a team lead running an analyst. The skill on display was not code reading. It was knowing what to ask, when to trust an answer, when to make the agent go back and check, how to structure the work into steps the agent could actually complete.
I do not claim to read twenty-year-old SAS scripts fluently. I never have. What I do have is twenty years of leading delivery teams in regulated environments, and that is the muscle I was using in the room. The agent was doing the reading. I was doing the leading.
That is the whole shift, and I think the industry needs to understand it clearly. We are not just making developers more productive. We are changing the operating model of a dev team entirely. Not in one day, and not without effort, but without that shift the long term impact stays limited.
The 0 to 20% ceiling is not a capability problem
Anthropic's 2026 Agentic Coding Trends Report contains one sentence that most commentary has underread. The Societal Impacts team writes that "while developers use AI in roughly 60% of their work, they report being able to 'fully delegate' only 0-20% of tasks."
The reflex reading is that the gap closes when the model gets smarter. One more generation, better tool use, longer context, and suddenly the 20% becomes 60%. I do not think that is what is actually happening.
The ceiling is a framing artefact. It is what you get when you imagine one developer plus one agent, both pointed at one task. In that frame, the developer is the bottleneck, and no improvement to the agent fixes the shape of the work. The way past the ceiling is not a smarter agent. It is more agents, placed at different points of the software lifecycle, coordinated by a human who has genuine delivery experience.
That is a different question entirely. It is an operating model question.
The industry is already moving this way
Look at where Anthropic itself is investing. On 17 April 2026, they launched Claude Design, a research preview powered by Claude Opus 4.7 that creates designs, prototypes, slides, and one-pagers through conversation. During onboarding, it reads your codebase and your existing design files and builds a design system for your team. When a design is ready, it produces a handoff bundle automatically. No manual spec document between designer and developer.
The handoff bundle is the tell. Anthropic is not building a faster Figma. They are closing the loop from exploration to prototype to production code, and they are building agent work into the design phase that used to be entirely human. The strategic framing in their launch is "Design plus Code plus Cowork." Three agents, not one, coordinated around a single lifecycle.
Claude Code Agent Teams, which Anthropic shipped as a first-class feature, makes the same point from the engineering side. A lead session dispatches work to teammate sessions, each with its own context window. That is not an assistant. That is a team, with a lead coordinating specialised members.
Replit has been moving the same direction for a long time. Their agent has been quietly absorbing more of the end-to-end application lifecycle, from prompt to deployed and running application. I am not quoting their numbers here because the point is directional, not promotional. The market is converging on full-SDLC automation, not on a smarter single coder.
If the frontier labs and the end-to-end platforms are all building toward multi-agent lifecycles at once, that is not coincidence. It is the shape of the next generation of work.
The skills that actually matter now
If the operating model is team-lead-plus-agent-team, the question that follows is which human skills hold up and which ones stop mattering.
The ones that hold up are the ones I have been growing for over twenty years. Breaking a vague goal into work that someone else can actually deliver. Reading a plan and noticing what is missing. Asking the right question at the right moment. Knowing when to trust an expert's answer and when to push back. Having enough scar tissue from past projects to recognise the shape of a problem before it becomes one. Running a review that catches real risk without drowning the team in paperwork.
These are leadership and delivery skills. They were valuable when the team was 80 engineers. They are arguably more valuable when the team is eight agents, because the agents will do whatever you ask with total confidence, including the wrong thing, and the human sitting in the lead seat is the only one who will notice in time.
The skills that stop mattering, at least in the way they used to, are the ones that used to define seniority in a codebase. Memorising a framework's API. Being faster at the keyboard than the person next to you. Holding an entire architecture in your head. The agents hold it now, and they do it with more recall than any human ever did. That is not a loss. It is a reallocation. The scarce resource is no longer the person who can write the code. It is the person who can lead a team of entities that can.
Why Shipwright is an SDLC framework, not a coding tool
This is the operating model I am trying to encode in Shipwright. From Specify through Deploy, each phase is a skill the agent runs. The skills do the work. The hooks are what check whether the agent actually does what it should, not the skill itself. That separation is the whole point: a skill can be wrong, a hook catches it. The human sits at the top, not as a prompt engineer, but as the team lead who breaks the work down, sets the acceptance criteria, reviews what each skill produces, and keeps the whole thing pointed at a real outcome. Shipwright is going into early access soon.
I did not build Shipwright as a better code agent because the code agent is not the bottleneck. The bottleneck is the coordination work across specs, design, planning, implementation, validation, release, and deployment. That is where the 80% supervision time was landing in my own projects, and that is where the leverage is.
The closing thought
If we keep pointing a single agent at a single task and measure how far delegation goes, we will keep hitting the 0 to 20% ceiling, and we will keep concluding that the models need another generation. That conclusion is comfortable, and it is wrong.
The ceiling breaks in two places at once. We need more agents across more phases of the lifecycle, and we need to develop humans who know how to lead them. One without the other is a dead end. Smarter agents with no one leading them do confident, coordinated damage. Better leaders with only one agent hit the same wall everybody else does.
The project meeting this week was not a story about an AI tool. It was a story about a skill profile that is starting to matter more than the one we have been hiring for. 10x productivity is not a model capability. It is an operating model choice, and the organisations capable of adapting quickly enough will be the ones that thrive.
Sources:
- 2026 Agentic Coding Trends Report - Anthropic - 10.04.2026 - https://resources.anthropic.com/2026-agentic-coding-trends-report
- Claude Design, Anthropic Labs - Anthropic - 17.04.2026 - https://www.anthropic.com/news/claude-design-anthropic-labs
- Anthropic launches Claude Design, a new product for creating quick visuals - TechCrunch - 17.04.2026 - https://techcrunch.com/2026/04/17/anthropic-launches-claude-design-a-new-product-for-creating-quick-visuals/
- Claude Code Agent Teams - Anthropic - 2026 - https://code.claude.com/docs/en/agent-teams
