📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google argues that in AI-driven software development, the model itself accounts for only 10% of system behavior. The key to success lies in harness design and context engineering, shifting focus from models to configuration and verification.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model accounts for only about 10% of the behavior in AI-driven systems. The report emphasizes that the real value comes from the harness—tools, prompts, rules, and context—that surround the model, shifting the focus of AI development and deployment.

The whitepaper, titled The New SDLC With Vibe Coding, argues that the dominant challenge in AI-assisted software engineering is not the model itself but how developers configure, verify, and guide its outputs. It cites experiments where tweaking only the harness or context improved agent performance significantly, while changing the model had minimal impact. For example, one team moved a coding agent from outside the Top 30 to the Top 5 on a benchmark by adjusting the harness alone.

Furthermore, the authors differentiate between ‘vibe coding’—quick prompts with minimal oversight—and ‘agentic engineering,’ which involves structured, verified, and monitored AI workflows. They stress that the costs and risks associated with unstructured, prompt-based AI use are high, including token waste, security vulnerabilities, and maintenance burdens. The report suggests that investing in harness and context engineering offers a more sustainable and cost-effective approach.

At a glance
reportWhen: published March 2026
The developmentThe Google whitepaper highlights that the core of AI-assisted development is not the model but the surrounding harness and context, fundamentally changing software engineering strategies.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Impact of Harness and Context on AI Development Success

This shift in understanding has major implications for organizations adopting AI. It suggests that building robust harnesses and managing context will determine the quality, reliability, and cost-efficiency of AI systems, rather than focusing solely on accessing the latest models. Companies that master configuration and verification can gain a durable competitive advantage, while those fixated on model improvements may see diminishing returns.

Amazon

AI model testing tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI Coding Practices and Industry Insights

The whitepaper builds on ongoing trends where AI is increasingly integrated into software workflows. As of early 2026, reports indicate that 85% of developers use AI coding agents, with more than half doing so daily. The industry has moved from vibe coding—quick, minimal oversight—to more disciplined, structured approaches. Prior to this, the focus was primarily on model capabilities, but recent experiments and benchmarks underscore that configuration, scaffolding, and context management are now the key differentiators.

This perspective aligns with broader industry shifts emphasizing verification, testing, and cost management over raw model performance, reflecting a maturation in AI integration practices.

“The behavior you experience in AI tools is dominated by scaffolding you can build, own, and improve, not the model itself.”

— Addy Osmani

Amazon

software verification tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Model-Harness Dynamics

It remains unclear how universally applicable these findings are across different AI tasks and industries. The specific impact of harness design versus model improvements in real-world, large-scale deployments needs further empirical validation. Additionally, the long-term effects of this paradigm shift on AI model development strategies are still emerging.

Amazon

AI prompt engineering toolkit

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Future Directions in AI System Engineering

Organizations are likely to invest more in developing sophisticated harnesses, context management, and verification frameworks. Further research will explore best practices for scalable harness design and the integration of dynamic context loading. Monitoring how this shift influences AI model development and industry standards will be key in the coming months.

Amazon

AI development environment

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system behavior?

The whitepaper shows that the surrounding harness—prompts, rules, tools, and context—has a much larger influence on the AI’s output than the model itself, which accounts for roughly 10%.

How does this change AI development strategies?

It shifts focus from chasing better models to designing better harnesses, managing context, and implementing verification to improve performance and reduce costs.

What are the risks of vibe coding versus agentic engineering?

Vibe coding, which relies on quick prompts and minimal oversight, can lead to high token costs, security vulnerabilities, and maintenance issues. Agentic engineering emphasizes structured, verified workflows, which are more cost-effective long-term.

Will this approach work for all AI tasks?

It is still uncertain how universally applicable this paradigm is across different domains. More research and real-world testing are needed to validate its effectiveness broadly.

Source: ThorstenMeyerAI.com

You May Also Like

India: Build the Rails First

India has built world-class digital infrastructure like Aadhaar and UPI to deliver benefits at scale, focusing on plumbing over direct benefits. Next steps are uncertain.

Capital: The Lever Beneath the Levers

Analyzing how the flow of capital underpins AI’s explosive valuation growth and the associated risks amid public listings of major AI firms in 2026.

Kubernetes at the Edge: Lightweight Distributions

On the edge, lightweight Kubernetes distributions enable efficient, reliable device management—discover how they can transform your infrastructure.

UPS Battery Backups: The Feature That Saves Your Work Mid‑Outage

Keenly essential during outages, UPS battery backups protect your work, but discover how their full features can keep you even safer.