📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI has achieved significant automation in engineering tasks related to AI development, nearing saturation on key benchmarks. However, research activities still pose challenges, leaving open questions about the extent of automation in scientific discovery.
Recent analyses confirm that AI systems can now automate the core engineering tasks involved in AI research and development, reaching near-saturation levels on key benchmarks. Meanwhile, the automation of AI research itself remains uncertain, with some aspects potentially less automatable than engineering tasks. This development signals a potential shift in how AI advances may unfold in the coming months.
Thorsten Meyer’s review of Jack Clark’s recent work highlights that six critical benchmarks measuring AI’s ability to perform core R&D tasks are approaching or have reached saturation. For instance, the CORE-Bench, which assesses research reproduction, has improved from 21.5% in September 2024 to 95.5% in December 2025, with the benchmark’s author declaring it “solved.” Similarly, the MLE-Bench, evaluating Kaggle competition performance, has advanced from 16.9% in October 2024 to 64.4% in February 2026. These improvements indicate that AI can reliably handle complex, friction-laden engineering tasks such as reproducing research papers and competing in ML competitions at levels comparable to mid-tier human practitioners.
Clark’s analysis suggests that these engineering tasks are effectively nearing full automation, with the remaining challenges being primarily operational or logistical. Conversely, research activities—such as generating novel hypotheses, designing experiments, and creative problem-solving—are less clearly automatable. Clark leaves open whether research is fundamentally a form of scaled engineering or if it involves distinct cognitive processes that resist automation. The current trajectory implies that engineering may be largely automated within the next 32 months, but research may lag behind, creating a residual frontier for AI development.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

AI Workflow Automation for Bloggers: Build a Simple Content System to Research, Write, Optimize, and Repurpose Posts Faster with AI and No-Code Tools (AI Toolkit for Bloggers 2026 Book 8)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
AI engineering automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI research hypothesis generation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
AI experiment design tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of Engineering Automation for AI Development
This shift could accelerate AI progress significantly, as automating engineering tasks reduces bottlenecks in model development, testing, and deployment. However, the remaining human role in research and scientific discovery could slow overall innovation if AI cannot fully automate the creative and hypothesis-driven aspects of research. Understanding this divide informs expectations about AI’s future capabilities and the strategic focus for AI research organizations.
Progress in AI R&D Skill Benchmarks and Industry Response
Recent years have seen rapid advances in AI capabilities across multiple benchmarks relevant to AI research and engineering. The CORE-Bench, measuring research reproduction, and the MLE-Bench, evaluating Kaggle competition performance, exemplify this trend, with both nearing saturation. These benchmarks reflect broader developments such as improved GPU kernel design, automated code conversion, and production-grade model optimization, indicating that AI is transitioning from experimental to operational stages in engineering tasks. Meanwhile, the question of whether AI can automate the more creative, hypothesis-driven aspects of research remains open, with ongoing debate among experts.
“Clark’s conclusion is correct and possibly understated for engineering. The residual research question is real but may be less binding than the framing suggests.”
— Thorsten Meyer
Unclear Extent of AI Automation in Scientific Research
While engineering tasks are nearing full automation, it remains uncertain how much of the research process—such as hypothesis generation, experimental design, and creative problem-solving—can be automated. Clark leaves open whether research is fundamentally a scaled form of engineering or involves distinct cognitive skills that resist automation. The pace at which research activities will become automated is still developing and subject to ongoing technological and theoretical advances.
Next Milestones in AI R&D Automation Progress
In the coming 32 months, expect continued improvements in benchmarks measuring engineering tasks, potentially reaching full saturation. Industry and academia will likely focus on operationalizing these capabilities and addressing remaining bottlenecks. Meanwhile, research automation remains an open question, with ongoing experiments and debates about the nature of scientific discovery and AI’s role in it. Monitoring these developments will clarify whether research automation accelerates or remains a human-led endeavor.
Key Questions
What are the key benchmarks indicating AI automation in engineering?
Core-Bench for research reproduction and MLE-Bench for Kaggle competitions are primary benchmarks. Both are nearing or have reached saturation, indicating high levels of automation in these tasks.
Does this mean AI can fully automate scientific research?
Not yet. While engineering tasks are approaching full automation, the automation of creative and hypothesis-driven research remains uncertain and is an active area of investigation.
What are the implications for AI development teams?
Teams may shift focus towards operationalizing automated engineering capabilities and exploring the residual research frontier, potentially accelerating development cycles and reducing costs.
How might this shift affect the pace of AI innovation?
If engineering automation continues to advance rapidly, it could significantly speed up AI model development and deployment. However, breakthroughs in automating research processes are necessary to sustain long-term innovation acceleration.
Source: ThorstenMeyerAI.com