From the blog
Insights on AI agents, MCP servers, and building software for the AI era.

The 88% Problem: Why Most AI Agents Never Make It to Production
Enterprise companies are pouring millions into AI agents — yet 88% of pilots never reach production. A May 2026 Microsoft Research paper reveals the silent reason why: the models themselves are eroding the work they are supposed to do.

How a 7B Agent Beats GPT-4o: The RL Training Method Reshaping Agentic AI
AgentFlow's Flow-GRPO rewrites how agents learn across multi-turn tool use — and a 7B model outperforms GPT-4o because of it.

The Math Behind Large Language Models: A Worked Walk-Through
From raw text to probability distributions: every computation a transformer performs, explained with arithmetic you can verify by hand.

The Thinking Tax: Why AI Models That Reason Cost More — and When That's Worth It
A new generation of AI models spends variable amounts of compute thinking before they answer — more on hard problems, less on easy ones. Understanding this shift changes how you choose models, budget API costs, and architect AI-powered products.

The Spreadsheet Moment: How Claude Code Will Reshape Adjacent Engineering Roles
Agentic coding won't hit senior engineers first. It will hit the roles built around doing what engineers don't have time for.

The Honest Guts of a Language Model: Transformers Explained Without the Fluff
What actually happens when you send a message to an LLM? Tokens, vectors, attention, and next-token prediction — the real mechanical picture, no marketing required.

The Agentic Inflection: How Claude Opus 4.7 Is Reshaping Software Teams
Anthropic's Claude Opus 4.7 scores 87.6% on the industry's hardest software engineering benchmark. Combined with a one-million-token context window and the Model Context Protocol, the model marks an inflection point in how small and mid-sized software teams can structure their work — and their headcount decisions.

The Coordination Illusion: Why More AI Agents Can Mean Worse Results
New research from UC Berkeley, MIT, and ETH Zürich proves that unstructured multi-agent AI systems amplify errors up to 17x. Here's what the science actually says about when to use multiple agents — and when not to.

The Thinking Dial: How AI Models Are Learning to Know When to Reason
The latest AI models no longer apply the same depth of reasoning to every problem. A new wave of research and API controls — adaptive thinking, effort parameters, hybrid thinking modes — lets models calibrate cognitive effort to task complexity. Here is what that shift means for your costs, latency, and product reliability.

The Inference Paradox: Why Your AI Bill Keeps Rising as Token Prices Fall
Token prices have fallen 280x in two years, yet enterprise AI budgets exploded 480%. Gartner, Deloitte, and FinOps data explain the paradox — and how to escape it.

The 68-Point Gap: Why AI Agents Stall Before They Ship
79% of enterprises have adopted AI agents in some form. Only 11% have them running in production. The 68-point gap between those two numbers is the most consequential story in enterprise technology right now — and the cause is not what most people expect.
Physical AI: The Sim-to-Real Breakthrough Has Arrived
In March 2026, Ai2’s MolmoBot achieved a 79.2% success rate on real robot tasks — trained entirely on simulation data, with zero real-world demonstrations. NVIDIA’s GR00T N2 arrived the same week. A quiet threshold has been crossed.