📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google highlights that AI models account for only about 10% of system behavior. The key to effective AI deployment lies in harness design and context engineering, not just the models themselves.

A new Google whitepaper emphasizes that the most significant shift in software engineering is moving from focusing on AI models to prioritizing harness design and context engineering, with the model itself representing only about 10% of system behavior.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that 85% of professional developers use AI coding agents regularly, with 51% using them daily, and approximately 41% of all new code generated by AI. The core insight is that the model is only a small fraction of the system’s effectiveness. Instead, the harness — including prompts, tools, rules, and observability — accounts for roughly 90% of behavior.

Concrete examples include experiments where changing only the harness, such as prompts and middleware, significantly improved AI agent performance, despite using the same underlying model. The paper urges teams to see the harness as their primary surface area for optimization, rather than the model provider.

The whitepaper also emphasizes the importance of context engineering, which involves managing instructions, knowledge, memory, examples, tools, and guardrails to improve code quality. The authors argue that strategic investments in harness and context are more impactful than chasing the latest models.

At a glance
reportWhen: published March 2026
The developmentGoogle’s new whitepaper reveals that the core of AI system performance depends more on harness configuration and context management than on the AI models used.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Why Harness Design Outweighs Model Choice in AI Success

This shift in focus matters because it redefines where organizations should invest resources. Instead of constantly upgrading to the newest AI model, companies can achieve better results by improving their harnesses and context management. This approach reduces costs, enhances reliability, and builds durable competitive advantages, especially as AI deployment becomes a core part of software development.

Furthermore, understanding that costs are driven more by configuration and token economy than by the model itself can lead to more disciplined and cost-effective AI strategies. This insight challenges the common perception that the model is the primary driver of AI performance.

Amazon

AI harness configuration tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on AI System Design and the Shift in Focus

Prior to this whitepaper, many organizations believed that upgrading to larger, more powerful AI models was the key to better performance. However, recent experiments and industry reports suggest that the bottleneck often lies in how models are integrated and managed. The concept of vibe coding — quick prompts with minimal oversight — was prevalent but often inefficient and costly over time.

The paper situates this insight within a broader trend: the move toward agentic engineering, where AI systems are built with formal specifications, verification, and structured context, rather than ad-hoc prompt engineering. This reflects a maturation in AI development practices, emphasizing reliability and cost control.

“The model is only 10% of what determines behavior; the harness is the other 90%.”

— Addy Osmani

Amazon

AI context engineering software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Implementation and Industry Adoption

While the whitepaper provides compelling evidence and examples, it is still unclear how widely organizations will adopt this paradigm shift in practice. The precise impact on costs, timelines, and team workflows remains to be seen, and some organizations may face challenges in reorienting their development processes.

Additionally, the long-term effects of focusing on harness and context rather than models are still emerging, and further empirical data is needed to confirm the generalizability of these findings across different domains and scales.

Amazon

AI observability and monitoring tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Organizations and Developers

Organizations should evaluate their current AI workflows, emphasizing harness design, context engineering, and verification processes. Investing in tools and practices that improve configuration management and structured context will be critical. Industry groups and standards bodies may also develop guidelines to support this shift.

Further research and case studies are expected to emerge, clarifying best practices and quantifying cost savings. Companies that proactively adapt their AI development strategies to prioritize harness and context are likely to gain a competitive edge in reliability and efficiency.

Amazon

prompt engineering tools for AI

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of AI system behavior?

The whitepaper shows that most of an AI system’s performance depends on how it is configured and managed through prompts, tools, and rules, rather than the underlying model itself.

How can organizations improve their AI systems based on this insight?

By focusing on harness design, including better prompts, tools, guardrails, and structured context, organizations can significantly enhance AI reliability and reduce costs.

Does this mean we should stop upgrading models?

The whitepaper suggests that model upgrades are less impactful than optimizing harness and context. Upgrading models can still be beneficial, but it should not be the sole focus.

What are the risks of ignoring harness design?

Ignoring harness design can lead to higher failure rates, increased costs, security vulnerabilities, and less predictable AI behavior, undermining trust and efficiency.

Is this approach applicable to all AI applications?

While most AI workflows can benefit, the emphasis on harness and context is especially relevant for complex, production-level systems where reliability and cost are critical.

Source: ThorstenMeyerAI.com

You May Also Like

The Neocloud Cartel: How the AI Industry Started Renting Compute From Itself

Exploring how AI companies now rent compute from each other, forming a small cartel centered around Nvidia, and the potential risks involved.

The Compounding Error Problem — Why 99.9% Alignment Decays to 60% in 500 Generations

Research shows that 99.9% alignment accuracy drops to 60% after 500 generations, raising concerns over recursive self-improvement safety.

Why Sensor Placement Matters More Than Sensor Count

The true value of a sensor setup lies in strategic placement, which ensures accurate data—discover how it can transform your results.

Disk Is the Contract: Inside Threlmark’s Local-First Architecture

Threlmark treats local disk storage as the definitive source of truth, simplifying sync, enhancing offline use, and ensuring data portability without traditional databases.