The Model Is Only 10%: The Real Lesson of the New SDLC

📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google highlights that in AI development, the model itself accounts for only 10% of system behavior. The focus shifts to harnessing, verification, and context engineering, affecting how organizations should invest in AI tools.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model constitutes only about 10% of the factors influencing system behavior. This challenges common assumptions that model improvements alone drive AI performance and underscores the importance of harness design, verification, and context engineering. The insight has broad implications for how organizations allocate resources and develop AI strategies.

The whitepaper, titled The New SDLC With Vibe Coding, emphasizes that the dominant part of AI system performance depends on the harness: prompts, rules, tools, and observability layers surrounding the model. Experiments cited show that tweaking the harness can dramatically improve performance even with the same underlying model, such as moving a coding agent from outside the Top 30 to Top 5 on a benchmark by changing only the harness components.

Furthermore, the paper introduces the concept of agentic engineering, where AI is integrated into formal specifications, automated testing, and continuous integration processes. This approach contrasts with vibe coding, which relies on minimal structure and verification, often leading to higher operational costs over time. The authors argue that cost efficiency and system robustness depend heavily on designing and owning the harness and context, not just selecting the latest model.

At a glance
reportWhen: published March 2026
The developmentGoogle’s new whitepaper reveals that the core of effective AI systems lies in harness and configuration, not just the AI model, marking a significant shift in AI development strategies.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development and Investment

This shift in understanding impacts how organizations should invest in AI: focusing on harness design, verification, and context management offers more durable advantages than chasing the latest model. It also raises questions about current AI strategies that prioritize model improvements over system architecture. Companies that adapt to this insight could reduce costs, improve reliability, and better control AI behavior in production environments.

Amazon

AI system verification tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background of the AI System Shift

Until now, many in the industry believed that advances in AI models—such as larger neural networks and more training data—were the primary drivers of system performance. However, recent developments, including the widespread adoption of AI coding agents, have shown that the surrounding infrastructure—prompts, rules, tools, and observability—play a far more significant role. The whitepaper builds on this trend, providing experimental evidence that configuration and harness design are more impactful than model size or complexity.

This perspective aligns with earlier industry observations that most AI failures originate from configuration errors or missing tools, not model deficiencies. The paper formalizes this understanding and offers a framework for organizations to rethink their AI development priorities.

“The behavior you experience is dominated by scaffolding you can build, own, and improve—your harness, not the model itself.”

— Addy Osmani

Amazon

AI harness configuration software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unresolved Questions About Model vs. Harness Impact

While the whitepaper provides strong experimental evidence that harnesses are more influential than models, it remains unclear how universally applicable these findings are across different AI domains and tasks. The precise methods for optimizing harness design at scale, and how these strategies translate to less structured or more complex systems, are still being explored. Additionally, the long-term impact of this shift on AI innovation and model development remains to be seen.

Amazon

AI observability and monitoring tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Future Directions for AI System Design

Organizations are expected to reevaluate their AI development priorities, investing more in harness engineering, verification, and context management. Future research will likely focus on formalizing best practices for harness design, developing tools for scalable context engineering, and establishing standards for system robustness. Industry leaders may also experiment with integrating these principles into their AI workflows to reduce costs and improve reliability.

Amazon

automated testing tools for AI

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why does the model only account for 10% of system behavior?

The whitepaper’s experiments show that the surrounding infrastructure—prompts, rules, tools, and observability—has a much larger influence on AI performance than the model itself, often accounting for about 90% of the behavior.

How should companies change their AI strategies based on this insight?

Companies should focus more on designing and owning their harnesses—configuration, verification, and context—rather than solely chasing more advanced models. Investing in system architecture can lead to better performance and lower costs.

What are the risks of focusing too much on harness and configuration?

Over-reliance on configuration without proper verification and testing could lead to system vulnerabilities or unpredictable behaviors. Balancing harness design with rigorous verification is essential for safe, reliable AI deployment.

Will this shift affect the pace of AI innovation?

It may slow the focus on model development but encourages more sustainable, cost-effective innovation through better system engineering and configuration practices.

Source: ThorstenMeyerAI.com

You May Also Like

World Model Readiness: Are You Ready for AI That Acts?

Assess your organization’s preparedness for AI systems capable of predicting and acting, as world models become central to AI development in 2026.

The bottom rung. The danger isn’t the lost jobs. It’s the layer that made the seniors.

Entry-level jobs in the US are shrinking sharply, but the deeper concern is the loss of the training layer that develops future senior workers, with uncertain long-term effects.

From Call Centers to Coding—Jobs AI Bots May Soon Dominate.

Jobs from call centers to coding face AI bot dominance; discover how to stay relevant in this rapidly evolving landscape.

AI Music Startups: Suno’s $2 Billion Valuation and Copyright Challenges

Growing AI music startups like Suno hit $2 billion valuations amid copyright disputes, leaving you wondering how legal battles will shape the future.