AI-Native Development: How to Build Less, Ship More, and Fire Half Your Team
How to rebuild software development into an AI-Native model — the methodology, the agents, the org change, and why most companies will get it wrong
Most AI-transformations or AI-adoptions follow the same script. Tools get purchased. Workshops get scheduled. A company-wide email goes out with the word “journey” in it. Six months pass. Nothing has changed – except the headcount. People get fired, performance declines, panic sets in, and in some cases, the same people who were fired are quietly rehired a couple of months later. This has happened. More than once. At companies that genuinely believed they were doing it right.
So let’s talk about what doing it right actually looks like.
The imaginary company for today’s episode: a large international fintech. Multiple financial products. Millions of users. Legacy systems held together by undocumented tribal knowledge and collective anxiety. A banking core migration that needs to move faster. Five hundred engineers. And a management team with a completely rational fear of touching anything – because the last time someone touched something, three payment adapters stopped working for six hours. You know that type.
The stated goal was straightforward: accelerate the migration, cut development times, use AI. What it becomes (in any honest version of this story) is a full restructuring of how software gets made. This episode covers both sides of that: first, the technical methodology – what the AI-agentic system looks like and how it gets built; second, the part that determines whether any of it survives contact with a real organization. That second part is where most transformation “journeys” quietly die.
The Problem Was Never the Tools
Before touching a single agent or writing a single prompt, you need a diagnosis. Not a vibes-based assessment delivered in a leadership offsite. An actual diagnosis. And in our imaginary fintech, the results are predictably grim.
Almost no documentation exists. Business logic lives in the heads of people who’ve been there since the beginning (they know it, and they like it that way). The codebase is an archaeology site where architectural decisions from five years ago sit undisturbed and unexplained. Engineering practices vary so wildly between teams that they barely share a common definition of “done.” And the culture of innovation is, to put it generously, aspirational.
First step – AI tools. Giving everyone Claude or Cursor access is fine, it’s a reasonable first step. But it’s only that — a first step. AI tooling doesn’t transform a process. It amplifies the one that already exists. In a well-organized team with clear requirements and good practices, it accelerates output. In a team with no documentation and inconsistent engineering standards, it accelerates the production of confidently written, untestable garbage. The problems don’t disappear. They just compound faster.
So the first principle, the one that everything else depends on, is this: AI-Native development begins with the specification. Code is a consequence.
Not a fashionable idea, I know. Not a popular one. Engineers want to build things. Managers want to see velocity metrics. Nobody wants to spend three weeks writing specs. But this is exactly the difference between agent-based development and what the industry has started calling “vibe coding” – where a developer rapidly produces something that looks like a solution, and nobody, including the developer, is entirely sure whether it matches the actual business logic, the architecture, the security requirements, or reality in general.
The Archaeologist
The first agent you deploy in a setup like this isn’t a code generator. It’s an archaeologist. Its job is to reconstruct the actual state of the system — moving beyond what the outdated wiki claims and documenting what the code, services, APIs, data schemas, integrations, and infrastructure connections reveal right now. The output is a real AS-IS specification: which services exist, how they communicate, where business logic is buried, which components are legacy, and which parts of the system will punish you for touching them.
In parallel, everything goes into a vector database for RAG: code, documentation, integration specs, banking regulations, payment scheme requirements, and internal standards. In fintech, as in most other companies where at least some level of regulation is required, this is probably non-negotiable. There are tens of gigabytes of context that AI needs in order not to humiliate you in front of a regulator. Why is this important? Because if you simply ask ChatGPT later, even with proper context, there is a high risk that the model will start hallucinating compliance rules that look and feel so trustworthy that you won’t even suspect a trap. Combine that with a general documentation gap, then guess where that leads.
The Living Specification
After the archaeologist comes an analyst agent – and the specification stops being a document. It becomes the management center of the entire product. Everything connects through it: business requirements, architecture, tasks, tests, code, releases, metrics. Change the spec, the change propagates. Ignore the spec, and within three months you’re back to tribal knowledge with a lot of gaps in business logic.
The analyst agent works in two modes.
Greenfield: designing a new product, module, or service from scratch. The spec is generative, unconstrained, and built from intent. That’s the best way to start any brand-new feature and perfect the agentic craft.
Brownfield: making changes to an existing system with legacy constraints, dependencies, and historically accumulated logic – the spec starts from the AS-IS the archaeologist produced, and every change has to account for what's already there and what must not break.
In established organization, the second mode is almost always the dominant one. Real companies rarely build from scratch. They inherit — and the cost of ignoring that inheritance shows up in production at the worst possible moment.
My suggestion is to adopt Gherkin language (Given, When, Then) as the standard format for writing requirements.
A quick detour for the uninitiated: Gherkin is a plain-language syntax originally built for Cucumber, a test automation framework. The idea was simple – write human-readable scenarios that non-technical stakeholders can understand and that machines can execute directly as tests. No translation layer needed, no interpretation gap. You write "Given the user has an active card, When they initiate a payment of $50, Then the system debits the account and returns a success status" – and Cucumber turns that into a runnable test. What makes it particularly well-suited for LLMs is structural predictability. AI models perform significantly better when context is formatted consistently and unambiguously. Gherkin provides exactly that.
Practically speaking – that’s BDD (Behavior-Driven Development). Its actual value is that the business owner, the analyst, the engineer, the tester, and the agent all work from the same behavioral description. You eliminate interpretation drift. No more “I thought you meant something else.” No more invisible assumptions quietly growing into production bugs.
The Product as a Management Object
The specification describes the product at several layers, and each one matters.
The product layer: what the product is, what user problem it solves, which features exist, which scenarios are supported, which metrics matter. This is necessary so that development begins with the question “what product behavior do we want?” — not “what code should we write?”
The business requirements layer: Gherkin scenarios per feature, functional requirements describing what the system must do, and non-functional requirements defining how well — performance, reliability, security, availability, scalability, observability, compliance. Every requirement gets an ID. This sounds bureaucratic, but without IDs you cannot connect a requirement to a test, a test to a task, a task to code, code to a release, a release to a metric. Without that chain, AI-Native development is just a collection of disconnected artifacts.
The technical layer: codebase, services, APIs, infrastructure, data schemas, data flows, integrations, CI/CD – and AI services. This last part tends to get skipped, and it shouldn’t. Prompts are part of the system. Context is part of the system. Prompt versioning is part of the system. Response quality evaluation is part of the system. If these aren’t described in the architecture, they can’t be managed – and in practice, they won’t be.
Change Requests and Decomposition
Any change in this model gets formalized as a Change Request: exactly what is changing, why, which components are affected, AS-IS versus TO-BE state, which requirements are touched, which tests are needed, which risks arise, which metrics must improve. The difference between AS-IS and TO-BE becomes the development plan.
A Project Manager agent decomposes each CR into a roadmap, features, tasks, subtasks – each with explicit acceptance criteria. The rule is simple: if you cannot define what “done” looks like, the agent cannot either. A vague task fed to AI produces a vague result (just faster).
The Test Chain
Only once you know what the system is (the AS-IS specification – archaeologist agent output) and have formally defined what you want to change about it (the Change Request decomposed into tasks with acceptance criteria – analyst and PM agent output) does the test engineering agent enter to build the verification strategy. The tests are written against a defined change, not in the abstract. Its job isn't to write tests as an afterthought. It's to build verification across every layer of the architecture: unit tests, integration tests, end-to-end scenarios, API checks, data checks, infrastructure checks, AI service checks. Every requirement ID maps to a test ID. That chain (requirement → test → task → code → release → metric) is what separates a managed production system from naive expensive improvisation.
Code gets written last. Why? You probably know the reason, but just to make it clear – code is cheap now. The full loop is:
requirement;
scenario;
tests;
code;
tests run;
product check as a user;
specification update.
Then the next cycle begins. What isn't cheap is context, judgment, and the ability to define what the system is actually supposed to do. The value has shifted entirely to task formulation, architectural decisions, and result verification. Agents handle the execution. People handle what requires thinking. Which is, frankly, what senior engineers should have been doing all along instead of spending afternoons writing boilerplate.
Five Hundred to Two Fifty
Technical methodology is necessary, but it’s also insufficient. This is precisely where most AI transformations fail – after the initial momentum has died, the dashboards have stopped looking exciting, or the CTO has moved on to their next engagement.
A company can build agents, configure RAG, describe specifications, introduce tests, and run three all-hands about the new way of working. Then six months pass, and everything quietly slides back.
This is the squirrel problem: people keep returning to where the nuts are.
Leads First, Everyone Else Second
The organizational change starts with structure. The target model is a matrix: product teams on one axis, functional verticals on the other — analysts, developers, testers, each with their own lead. Each vertical lead owns the AI transformation for their function. This matters because when everyone is vaguely responsible – nobody actually is.
Then the leads move. Team leads, tech leads, architects, strong senior engineers – they go first. If the leads don’t adopt the new model genuinely, the teams won’t either. Formally, everyone will use AI. In practice, they’ll continue working the old way: generating fragments of code, keeping context in personal notes, discussing requirements in chat threads, not updating the specification, and treating tests as something that happens at the end when there’s time (there is never time).
A lead in this model manages the team’s production system. They need to understand how to assign tasks to agents, how to keep the specification current, how to verify generated results, how to read the dashboard, how to analyze failures, and how to build requirements traceability. That’s the role. Managing agents is part of it. So is accountability for what the agents produce.
The methodology cannot be sent as a PDF document. People embed it into daily work through regular team sessions where the actual results get examined: where the agent made an error, where context was missing, which requirement was poorly formulated, which prompt returned a wrong result and so on. This is how teams develop judgment about AI as a production component – by treating its failures as system failures worth analyzing, the same way they’d treat a deployment incident.
The Academy
An internal channel matters more than it sounds. Successful cases, failures, specification templates, agent error patterns, practical session recordings, and answered questions accumulate there over time. It becomes visible proof that the new approach lives in daily work (not in a presentation for investors). Leads need to be active contributors. If leads are silent, the whole initiative reads as another corporate experiment with a shelf life. If leads share their work, their mistakes, and their results, the culture spreads through demonstration rather than mandate.
Every webinar and practical session feeds into something larger. A session that ends at broadcast is a wasted session. Every good one becomes an internal learning module: recording, summary notes, templates, examples, checklists, a practical assignment, error analysis, self-check criteria. Over time, that accumulates into the academy. This is how the transformation stops depending on specific people. As long as knowledge is transmitted orally, the whole thing is fragile. When it's formalized into a reproducible training system, the company stops being held hostage by whoever has been there the longest.
New employees onboard through the academy – not through the phrase "talk to that guy, he knows how it works." They enter and see how work is done: how the specification is structured, how requirements are written, how Change Requests are formalized, how Gherkin scenarios are built, how generated code gets verified. Onboarding time drops. Dependence on long-tenured employees drops with it.
Resistance, Sabotage, and the People Who Move Fast
Resistance is inevitable – and most of it, to be fair, is not irrational. Some engineers see real risks in delegating architectural decisions to a system they don’t fully understand. Some believe they can write the feature faster themselves (sometimes they’re right). Some argue the product is too complex, the domain too specialized, the codebase too sensitive. These objections deserve honest engagement.
The resistance that doesn’t deserve engagement is the performative kind. Someone participates in every workshop, uses AI tools visibly enough to be seen, and quietly does everything possible to prevent the new process from taking root. This is sabotage. It exists at every seniority level (especially senior levels), and it tends to be particularly committed among people whose informal authority was built on being the only person who understood something. That informal authority is exactly what the specification and RAG system are designed to dissolve.
Also an interesting pattern emerges: strong engineers move into the new model faster. The ones who struggle are often the ones whose value was tied to execution speed (e.g.: writing code quickly) rather than to the thinking behind it. With agents handling the execution, the premium shifts entirely to judgment, task formulation, context management, and product understanding. That’s a genuine change in what’s valued. Some people find it liberating. Others find it threatening.
The Numbers
When the AI-transformation methodology is fully in place, you have two choices – and both are legitimate depending on what the business actually needs.
The first option is to cut headcount. Half the team, roughly, can produce what the full team produced before. Be sure to measure it on things that matter: time from task formulation to release, test coverage, defect rate after release, onboarding speed. Usually the roles that shrink are the ones that existed to manage manual process: repetitive analyst work, boilerplate development, manual QA. The roles that remain are judgment, product thinking, system management, result verification. On certain categories of work the productivity differential usually runs between two and four times. The maths for our fintech of five hundred people is extremely simple – five hundred to two fifty.
The second choice is to keep the headcount and become twice as productive with what you already have. Same team, compounding output. Faster release cycles, better test coverage, shorter onboarding, fewer post-release incidents. If the business has more to build than it currently can, this is the more interesting option.
My humble opinion on the first variant: you've looked at a productivity multiplier and decided the best use of it is paying fewer people. Fine, if that's the goal. But that's a cost-cutting exercise with an AI story attached to it – definitely not AI-transformation.
My humble opinion on the second: this is what transformation actually looks like – the same organization doing materially more.
What doesn't work is the version most companies attempt – cut the people, keep the old processes, and call it a transformation.
The Role That Comes Next
Inevitably, a new role appears inside this model: the AI Product Engineer. This is someone who understands the product, the business requirements, and the architecture (and knows how to manage development through AI agents). They connect user needs, business logic, specification, requirements, architecture, data, tests, code, and metrics into a coherent system. Previously that required a cross-functional team of several specialists coordinating imperfectly across handoffs. Now a significant portion of that coordination happens through agents, while the responsibility for outcomes remains with a person.
Responsibility is the key word. In the old model, people were often responsible for completing a task. In this model, they’re responsible for the product. That’s a larger surface area, and not everyone wants it.
In practice, that means managing several agents at once. One agent reconstructs the AS-IS specification, another prepares the Change Request, a third writes the test strategy, a fourth checks architectural risks, a fifth analyzes compliance constraints, and a sixth updates dashboards. AI Product Engineers orchestrate production system and building pipelines – deciding which agent works first, which output becomes input for the next one, where human judgment is required, and where the process is allowed to move automatically.
Ideally, every engineer in the company becomes one. Not a specialized AI-team sitting next to the “normal” developers – every engineer, operating at this level. That’s the actual target state. Whether most companies get there is a different question.
The companies that build this correctly – specification before code, tests before implementation, agents embedded in the process, every requirement traceable, every change formalized, every engineer operating at a level that used to require a team – will not look like they “adopted AI”. They will look like they rebuilt how software gets made. The ones that didn’t will still be running workshops, nodding seriously while discussing how AI cannot really be trusted for production-grade systems.
AI transformation is not complicated per se. The discipline to follow it is. A smaller group will do the harder work and end up with something that actually compounds.
That’s the difference between an AI transformation and an AI story. Most companies will have the story, and if you’ve read this far, I hope you’re in the other group. See you in the next one!



