This Week in AI: RSI, Claude & RL Quality

This Week in AI: The Signal Worth Keeping

This week in AI was quieter on the headline-grabbing releases but louder on structural shifts — the kind that matter more to builders than benchmark announcements. Recursive self-improvement moved from theoretical talking point to funded institution. Claude's enterprise grip tightened dramatically. And a sharp practitioner post crystallized something we've watched quietly become a serious infrastructure problem: bad reinforcement learning environments breaking training before a single good gradient lands. Here's what we're paying attention to.

Sakana AI Makes Recursive Self-Improvement a Real Research Agenda

Sakana AI launched a dedicated Recursive Self-Improvement Lab in Tokyo this week, consolidating earlier projects — including The AI Scientist and Darwin Gödel Machine — under a single explicit mandate: build systems that improve themselves without requiring hyperscale compute. The framing matters. For months, RSI has been discussed as either a distant AGI concern or vague corporate strategy theater. Sakana is now treating it as an engineering problem with sample efficiency as the core design constraint.

For builders, this is a signal to watch rather than act on immediately. The practical implication is that RSI-capable systems don't belong exclusively to labs running hundred-thousand-GPU clusters — the research agenda is explicitly aimed at compute-constrained regimes. That changes the timeline calculus for anyone building AI products that depend on fine-tuning or post-training workflows.

Claude Owns Enterprise AI Tooling — by a Wide Margin

A survey of 173 leaders across over a hundred high-growth portfolio companies produced a number that surprised even us: Claude came in as the dominant AI tool across every major business function, cited by 73% of respondents overall. Among engineering teams specifically, Claude Code landed at 77% adoption, with Cursor at 50%. No other model came close across the board.

The equally important finding is what the data says about function-by-function adoption gaps. Engineering is operating in a fundamentally different reality than finance, HR, or customer success teams at the same companies. The playbook that works for dev tooling does not transfer to accounting or people operations — data quality, security, and brand-safety concerns are actively blocking adoption in those functions. Almost half of respondents said they're delivering more output without growing headcount; a quarter have upskilled staff into AI-adjacent roles rather than replaced them. If you're still debating which model to standardize on for a new product, the enterprise market has largely answered that question. The harder work now is building the right infrastructure around it.

If you're figuring out where AI fits in your own product or team workflows, we're happy to talk through it.

Bad RL Environments Are a Bigger Problem Than the Field Admits

This week brought one of the most practically useful technical posts we've seen on reinforcement learning infrastructure — a detailed breakdown of what actually goes wrong when teams ship flawed training harnesses. The core argument: in RL, the model generates its own training data by interacting with the environment. A broken environment — one with race conditions, unreliable resets, inconsistent reward signals, or literal crashing code — doesn't just add noise. It systematically pushes gradients in the wrong direction. The model learns the flaws in the harness, not the behavior you intended to reward.

Bad RL environments don't add noise — they teach your model the wrong things entirely, and you won't catch it until the training run is already wasted.

We've seen this pattern surface in agentic product work. Teams building post-training pipelines for subagents often underestimate how much the quality of the simulation environment determines the quality of the resulting model. The failure mode is insidious precisely because it's not always visible in early metrics — the model appears to be learning, just learning the wrong things. The fix isn't glamorous: read your trajectories, validate your reward logic, get domain experts in the loop before you scale the training run. For anyone building custom AI-powered applications that involve any kind of reinforcement learning or fine-tuning loop, environment quality deserves the same engineering rigor as your core product code.

Autodesk's $3.6B MaintainX Acquisition Signals Where Industrial AI Value Accumulates

Autodesk acquiring MaintainX for $3.6 billion this week is the clearest signal yet that the value in industrial AI isn't in standalone model performance — it's in unified data platforms. MaintainX connects maintenance and operational data; Autodesk brings design and construction data. Together, the pitch is a closed loop from design intent to operational reality, with AI workflows running across the full asset lifecycle. The acquisition reflects a broader pattern we're tracking: the teams winning in enterprise AI aren't building better models, they're consolidating the data infrastructure that makes models useful in context. Ramp's simultaneous launch of Stack — framed as an AI operating system for accounting firms — follows the same logic, targeting a structural talent gap by owning the workflow layer, not just the AI layer on top.

The Practitioner Takeaway This Week

If you're building anything with a reinforcement learning or fine-tuning loop in it, audit your training environment before you scale. The RL harness quality problem is real, it's underdiagnosed, and it's expensive to discover late. Separately, if you're making model selection decisions for a new product, the enterprise data is clear enough to stop hedging — Claude is the default, and you should have a deliberate reason to deviate from it. The bigger open question isn't which model to use; it's whether your data infrastructure is good enough to make that model useful in your specific domain. That's where the competitive gap is opening up. Get an estimate on what the right infrastructure looks like for your use case.

This week confirmed that the AI infrastructure layer — training environments, data platforms, enterprise tooling standardization — is where the real build work is happening right now, not in model releases. The teams that get this right in the next twelve months will have compounding advantages that are hard to close later. Next week, we're watching whether Anthropic's Claude Mythos release clarifies the Opus benchmark regression questions, and whether the Sakana RSI Lab publishes any concrete methodology behind the compute-efficiency claims.

“Bad RL environments don't add noise — they teach your model the wrong things entirely, and you won't catch it until the training run is already wasted.”

— Aleksandr Kamenev

Written by

Aleksandr Kamenev

Founder & CEO

Frequently asked questions

What happened in AI this week that matters for product builders?

Sakana AI launched a dedicated recursive self-improvement lab, enterprise data confirmed Claude as the dominant tool at 73% adoption, and a detailed analysis exposed how flawed RL training environments systematically corrupt model learning. The structural theme is that AI infrastructure quality — environments, data platforms, and tooling standardization — is now the primary competitive differentiator.

Why is RL environment quality suddenly a major concern for AI teams?

In reinforcement learning, the model generates its own training data by interacting with a simulated environment. If that environment has bugs, race conditions, or inconsistent reward logic, the model learns the harness failures rather than the intended behavior. This failure mode is particularly dangerous because it can look like learning in early metrics while the model is actually optimizing for the wrong objective entirely.

Is Claude really the dominant enterprise AI tool in 2026?

Based on a survey of 173 leaders across over a hundred high-growth technology companies, Claude ranked first across every major business function at 73% overall adoption. Engineering teams show the highest uptake, with Claude Code at 77%. The data also makes clear that adoption patterns differ sharply by function — what works for engineering teams does not automatically transfer to finance, HR, or customer success workflows. ---END_SECTION_IMAGES--- [ { "after_h2": "Sakana AI Makes Recursive Self-Improvement a Real Research Agenda", "prompt": "A single 3D geometric structure representing recursive loops — a glowing möbius-like form built from concentric crystalline rings, each ring slightly smaller and rotated, suggesting self-referential iteration, suspended against a deep dark background with purple and cyan gradient lighting, cool volumetric fog, all surfaces completely blank, no markings, no labels, diagram register, centered composition", "alt": "Abstract 3D recursive loop structure visualizing self-improving AI system architecture", "negative": "text, letters, numbers, words, labels, logos, faces, people, robots, humanoids, watermarks, arrows with text, brain imagery, lightbulb, clock" }, { "after_h2": "Claude Owns Enterprise AI Tooling — by a Wide Margin", "prompt": "A clean 3D bar chart visualization floating in dark space, one dramatically taller central bar glowing in bright cyan towering over several shorter bars in muted purple and dark teal, all bars are featureless geometric solids with no labels or numbers, subtle purple grid floor beneath, dramatic top-down lighting casting sharp shadows, all surfaces blank, diagram register, centered composition", "alt": "3D bar chart showing dominant single tool adoption compared to alternatives in enterprise AI", "negative": "text, letters, numbers, words, labels, logos, Claude name, Anthropic logo, brand marks, pie charts, people, faces, watermarks, percentage signs" }, { "after_h2": "Bad RL Environments Are a Bigger Problem Than the Field Admits", "prompt": "A 3D training pipeline diagram rendered as a series of connected geometric nodes and conduit tubes against a dark background, one central node cracked and leaking corrupted red-orange particles that flow downstream into otherwise healthy cyan nodes, the corrupted flow gradually tainting downstream elements, all surfaces blank, no labels, purple and cyan accent lighting, volumetric atmosphere, diagram register, centered composition", "alt": "3D pipeline diagram showing corrupted data flowing from a broken node into downstream training components", "negative": "text, letters, numbers, words, labels, logos, faces, people, robots, watermarks, fire, explosions, realistic damage, mechanical gears with writing" }, { "after_h2": "Autodesk's $3.6B MaintainX Acquisition Signals Where Industrial AI Value Accumulates", "prompt": "Two large 3D hexagonal platform structures floating in dark space, one glowing cyan and one glowing purple, connected by a broad luminous bridge of energy, beneath each platform a different geometric lattice pattern suggesting distinct data layers, the bridge joining them forms a unified bright white channel in the center, all surfaces completely blank, no markings, cool volumetric atmosphere, diagram register, centered composition", "alt": "Two data platform structures connected by a unified bridge representing industrial AI platform consolidation", "negative": "text, letters, numbers, words, labels, logos, buildings, construction equipment, autodesk logo, brand marks, people, faces, watermarks, dollar signs, coins" } ] ---END_SECTION_IMAGES---

This Week in AI: RSI Goes Institutional, Claude Dominates Enterprise, and RL Environment Quality Becomes a Blocking Issue

This Week in AI: The Signal Worth Keeping

Sakana AI Makes Recursive Self-Improvement a Real Research Agenda

Claude Owns Enterprise AI Tooling — by a Wide Margin

Bad RL Environments Are a Bigger Problem Than the Field Admits

Autodesk's $3.6B MaintainX Acquisition Signals Where Industrial AI Value Accumulates

The Practitioner Takeaway This Week

Aleksandr Kamenev

Frequently asked questions

Stay in the loop

Ready to ship something custom?

This Week in AI: The Signal Worth Keeping

Sakana AI Makes Recursive Self-Improvement a Real Research Agenda

Claude Owns Enterprise AI Tooling — by a Wide Margin

Bad RL Environments Are a Bigger Problem Than the Field Admits

Autodesk's $3.6B MaintainX Acquisition Signals Where Industrial AI Value Accumulates

The Practitioner Takeaway This Week

Aleksandr Kamenev

Frequently asked questions

More essays

This Week in AI: Claude Opus 5, FLUX 3 Video, an OpenAI Security Breach, and the Race to Own Physical AI

When AI Agents Go Rogue: What the OpenAI-Hugging Face Incident Means for Builders

AI in Rental Housing: What Millions of Renter Conversations Reveal

Stay in the loop

Ready to ship something custom?