📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google highlights that AI models account for only about 10% of system behavior. The key to effective AI deployment lies in harness design and context engineering, not just the models themselves.

A new Google whitepaper emphasizes that the most significant shift in software engineering is moving from focusing on AI models to prioritizing harness design and context engineering, with the model itself representing only about 10% of system behavior.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that 85% of professional developers use AI coding agents regularly, with 51% using them daily, and approximately 41% of all new code generated by AI. The core insight is that the model is only a small fraction of the system’s effectiveness. Instead, the harness — including prompts, tools, rules, and observability — accounts for roughly 90% of behavior.

Concrete examples include experiments where changing only the harness, such as prompts and middleware, significantly improved AI agent performance, despite using the same underlying model. The paper urges teams to see the harness as their primary surface area for optimization, rather than the model provider.

The whitepaper also emphasizes the importance of context engineering, which involves managing instructions, knowledge, memory, examples, tools, and guardrails to improve code quality. The authors argue that strategic investments in harness and context are more impactful than chasing the latest models.

At a glance

reportWhen: published March 2026

The developmentGoogle’s new whitepaper reveals that the core of AI system performance depends more on harness configuration and context management than on the AI models used.

The Model Is Only 10% — The New SDLC With Vibe Coding

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Why Harness Design Outweighs Model Choice in AI Success

This shift in focus matters because it redefines where organizations should invest resources. Instead of constantly upgrading to the newest AI model, companies can achieve better results by improving their harnesses and context management. This approach reduces costs, enhances reliability, and builds durable competitive advantages, especially as AI deployment becomes a core part of software development.

Furthermore, understanding that costs are driven more by configuration and token economy than by the model itself can lead to more disciplined and cost-effective AI strategies. This insight challenges the common perception that the model is the primary driver of AI performance.

Amazon

AI harness configuration tools

As an affiliate, we earn on qualifying purchases.

Background on AI System Design and the Shift in Focus

Prior to this whitepaper, many organizations believed that upgrading to larger, more powerful AI models was the key to better performance. However, recent experiments and industry reports suggest that the bottleneck often lies in how models are integrated and managed. The concept of vibe coding — quick prompts with minimal oversight — was prevalent but often inefficient and costly over time.

The paper situates this insight within a broader trend: the move toward agentic engineering, where AI systems are built with formal specifications, verification, and structured context, rather than ad-hoc prompt engineering. This reflects a maturation in AI development practices, emphasizing reliability and cost control.

“The model is only 10% of what determines behavior; the harness is the other 90%.”
— Addy Osmani

Amazon

AI context engineering software

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Implementation and Industry Adoption

While the whitepaper provides compelling evidence and examples, it is still unclear how widely organizations will adopt this paradigm shift in practice. The precise impact on costs, timelines, and team workflows remains to be seen, and some organizations may face challenges in reorienting their development processes.

Additionally, the long-term effects of focusing on harness and context rather than models are still emerging, and further empirical data is needed to confirm the generalizability of these findings across different domains and scales.

Amazon

AI observability and monitoring tools

As an affiliate, we earn on qualifying purchases.

Next Steps for Organizations and Developers

Organizations should evaluate their current AI workflows, emphasizing harness design, context engineering, and verification processes. Investing in tools and practices that improve configuration management and structured context will be critical. Industry groups and standards bodies may also develop guidelines to support this shift.

Further research and case studies are expected to emerge, clarifying best practices and quantifying cost savings. Companies that proactively adapt their AI development strategies to prioritize harness and context are likely to gain a competitive edge in reliability and efficiency.

Amazon

prompt engineering tools for AI

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of AI system behavior?

The whitepaper shows that most of an AI system’s performance depends on how it is configured and managed through prompts, tools, and rules, rather than the underlying model itself.

How can organizations improve their AI systems based on this insight?

By focusing on harness design, including better prompts, tools, guardrails, and structured context, organizations can significantly enhance AI reliability and reduce costs.

Does this mean we should stop upgrading models?

The whitepaper suggests that model upgrades are less impactful than optimizing harness and context. Upgrading models can still be beneficial, but it should not be the sole focus.

What are the risks of ignoring harness design?

Ignoring harness design can lead to higher failure rates, increased costs, security vulnerabilities, and less predictable AI behavior, undermining trust and efficiency.

Is this approach applicable to all AI applications?

While most AI workflows can benefit, the emphasis on harness and context is especially relevant for complex, production-level systems where reliability and cost are critical.

Source: ThorstenMeyerAI.com

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Cutrova: Edit the Words, Not the Timeline

Author

BARRIER MAGZ

Share article

The model is only 10%