Twelve hours. That's how long it took to build a feature that would have been two to three weeks of work six months ago.
Not a prototype. Not a proof of concept. A report generation system that requires research and analysis spread across multiple LLM calls, plugged into an existing module with established UI patterns, data models, and conventions. Production code, reviewed, tested, and merged.
The difference wasn't a better model or a bigger context window. It was a process: a specific sequence of AI sessions, each one scoped to do one kind of thinking well.
What emerged is a clear separation between two modes of AI work: planning work and execution work. They use different tools, different configurations, and different expectations. Mixing them is where things break down.
Here's how it works.
BMAD agents handle the planning work. BMAD is an open-source framework that gives you specialized AI agents: an architect, a UX designer, a product manager, a code reviewer. Learn more on its git repo.
Each agent has its own perspective and expertise baked into its definition. When I sit down with my AI architect, I'm getting an opinionated conversation partner who pushes back, asks hard questions, and thinks about tradeoffs. These aren't code generators. They're thinking partners.
Claude Code with Agent Teams and skills handles the execution work.
→ Agent Teams let you spin up multiple Claude instances that coordinate through a shared task list. One writing frontend stories, another handling backend, another reviewing, all working in parallel. As of this writing, they're still experimental and only a few weeks old.
→ Skills are focused reference cards that auto-inject based on what files an agent is touching. Edit a UI file and the component catalog appears. Work in the data layer and the model patterns show up. Nothing else. No persona instructions, no workflow context, no noise. Just the standards that matter for the code being written.
Playwright gives agents eyes. It's a browser automation tool that lets an AI agent actually open your application and see what's on screen.
Without Playwright, agents are reading code and guessing what it looks like. With it, they're looking at the same screen your users will see.
The planning agents carry full personas and broad context because judgment requires perspective. The execution agents carry minimal context and targeted skills because accuracy requires focus.
Here's how they work together.
I sat down with my AI architect to talk through the problem. What's complex, what's routine, where are the technical risks.
This isn't "generate me a design doc." It's a conversation. I'm describing the feature, the architect is asking questions, pushing back on assumptions, surfacing constraints I hadn't considered.
We came out with a technical direction and a clear picture of what needed proving before we committed to building.
Then I met with my AI UX designer to figure out how this feature fits into the existing screen flow.
This session produced the workflow design, the sequence of interactions from the user's perspective.
Back to the architect. The hard parts of this feature needed to be proven before we wrote production code: the multi-step LLM orchestration, the extraction pipeline and synthesis.
We explored three different solution approaches, built quick proofs of concept for each, and picked the one that handled edge cases best. This is where the risk gets retired. Everything after this is execution.
I brought my AI product manager into the existing module and the working POC. The PM analyzed the existing code, studied the proof of concept, and produced requirements with suggested epics.
This isn't the PM inventing requirements from a brief. It's the PM reading the existing system and the proven approach, then organizing what needs to be built.
This is where the process shifted. Instead of a single AI session trying to hold the entire codebase and all conventions in its head, I used Claude Agent Teams. Multiple Claude instances coordinating through a shared task list. One team of writers and reviewers produced twenty detailed stories.
The key: targeted skills made this work.
The agent writing a UI story gets the component catalog. The agent writing a data layer story gets the model patterns. Nothing else competes for attention.
The same model that used to reinvent my CSS started using existing components correctly. Not because it knew more, but because it could finally see what mattered.
My AI architect reviewed all twenty stories. Found issues:
Those issues went back to Claude, who updated the stories. Quality gates built into the process, not bolted on after.
Back to Agent Teams. This time with developers and reviewers, multiple Claude instances writing code in parallel, each with the relevant skills loaded for their slice of the work. Frontend, backend, tests, each owned by a different agent, coordinating through the shared task list.
My AI product manager ran the code review, and here's where it gets interesting.
The PM spun up its own Agent Team of developers to review the code. One of those developers used Playwright to actually load the application in a browser and verify the UI worked as specified. Not just static analysis. Not just reading code. An AI reviewer opening the app and clicking through it.
Issues found went back to Claude for fixing.
What emerged isn't just a faster way to build features. It's a clear separation between two modes of AI work.
Planning work: BMAD agents. Architecture conversations, UX design, requirements analysis, story review, code review. The sessions where perspective and judgment matter. Where you want an architect who pushes back, a PM who asks "did you consider this," a reviewer who catches what you missed. Full personas, full context, adversarial thinking.
Execution work: Claude Agent Teams with skills. Story writing, code generation. The sessions where accuracy and adherence to standards matter more than perspective. Where you want the model to know exactly which CSS classes exist, exactly how your data layer is structured, and nothing else cluttering the context.
The handoff between these modes is what makes the process work. The planning work produces clear direction. The execution work follows it precisely. Neither one could do the other's job well.
The individual stories were solid. Skills kept each one accurate and consistent with existing patterns. But the handoffs between stories caused problems in a few cases:
Skills solve the "does this story follow our standards" problem. They don't solve the "do these twenty stories work together as a sequence" problem.
Next time, I'm adding an explicit continuity review to step 6. Not just an architect checking each story for quality — an agent reviewing across stories for handoff points, shared state, and sequencing assumptions. A different lens.
That's how this process evolves. Ship, find the seam that broke, add the gate. Twelve hours is fast. But the goal isn't just fast. It's a process that gets more reliable each cycle.
The process is not done, but this combination of tools or process was a huge step forward.
Enjoy the journey.