The Intelligence Grid

THE NUMBER: 3 — the number of competing AI labs whose models Microsoft now orchestrates inside a single product. On Sunday, Satya Nadella introduced Critique — a multi-model deep research system built into Microsoft 365 Copilot. Claude generates a research report. Then ChatGPT fact-checks and improves it. Or vice versa. The company that owns 24% of OpenAI just publicly admitted that no single model is best at everything. That’s not a product update. That’s a confession — and a blueprint.

We wrote yesterday about Apple building the consumer routing layer for intelligence — Siri as a toll booth between 1.52 billion humans and every AI model on earth. We didn’t expect the enterprise version to arrive the next morning.

But here’s Microsoft, the company that bet $13 billion on OpenAI, routing queries to Claude inside its own flagship product. Not hiding it. Announcing it. Nadella stood on stage and said the words out loud: use multiple models together to generate optimal responses.

Ejaaz had the sharp observation: this is exactly what Karpathy described doing over the weekend — using multiple models to stress-test his own arguments. It’s becoming standard practice among the smartest practitioners. Microsoft just made it standard practice for 400 million Office users.

And you know what? Fair enough. OpenAI already did a deal with Amazon Web Services — right in front of their largest equity holder. The gentlemen’s agreements are over. Everyone routes to whoever’s best. That’s not betrayal. That’s procurement.

If Microsoft is intelligent about this and they usually are about money — they’ll use the OpenAI IPO as an exit ramp. Sell down gradually. Create liquidity for Microsoft shareholders. Transform OpenAI from a strategic bet into a supplier relationship. You don’t need to own the power plant. You need access to the grid.

⚡ Welcome to the Inference Market

Here’s the part nobody is writing about yet.

If routing is the architecture and models are commoditizing — and they are; anyone who uses these tools daily will tell you Claude writes better, Gemini does images better, Grok gives you the unhinged experience if that’s what you’re after — then the logical endpoint isn’t a routing layer. It’s a market.

Not a metaphorical market. A literal one. Real-time, bid-ask, every query.

Think about how electricity works. Over 2,000 power generators bid into the US grid every 15 minutes. The grid operator routes demand to the cheapest supplier that can meet the load. Peaker plants — expensive, powerful, rarely called — sit idle until demand spikes. Base load plants — cheap, always on — handle the 80%. Nobody owns every power plant. Nobody needs to. The grid makes it irrelevant.

The AI inference market is heading for exactly the same structure.

A query comes in. The routing layer evaluates complexity, stakes, and budget. Mythos bids $0.15 per thousand output tokens. Opus 4.6 bids $0.025. A specialist medical model bids $0.04 but with 95th percentile accuracy on orthopedic questions. The router picks based on what the task deserves.

That phrase — what the task deserves — is doing all the work here. And it changes the economics of AI more than any single model release ever will.

💰 The Veblen Threshold

This brings us to the most interesting story the AI newsletters couldn’t stop covering today: Claude Mythos.

Tomasz Tunguz — one of the sharpest venture minds in the business published a piece this morning called “Veblen & Jevon Walk Into a Data Center.” His argument: the era of cheaper tokens driving more consumption (Jevons Paradox) may be ending. Mythos is rumored at $15-25 per million input tokens and $75-150 per million output tokens. That’s 5-6x current Opus pricing. If the most powerful AI trades at a luxury premium, we’ve entered Veblen territory — where price signals capability and demand increases with cost.

It’s a clean thesis. But we think it’s slightly wrong.

Jevons isn’t ending. More people will use AI tomorrow than used it today, and the cost of basic inference keeps falling. What’s happening is that the market is stratifying — the same way the auto market stratified in the 1920s when GM realized you could sell a Chevrolet, a Buick, and a Cadillac off the same chassis. The chassis is the transformer architecture. The trim levels are the models. And the dealership is the routing layer that matches you to the right one.

Mythos won’t be a Veblen good for most queries. It will be a peaker plant.

Here’s why. If Opus 4.6 does 80% the job of Mythos at 20% of the cost — do you really care, except for the queries where the last 20% is worth 500% more money? In most workflows, you don’t. We use Gemini to score articles and Claude to write. Would we pay Mythos 6x the price to write a newsletter? No chance. The marginal improvement doesn’t justify the marginal cost.

But would you pay Mythos 6x if you were evaluating knee surgery options? If you were auditing production code for zero-day vulnerabilities — which the Mythos leak suggests it’s specifically built to find? If you were a law firm running a $200 million M&A deal through due diligence?

Absolutely. Because the stakes justify the tokens.

The Veblen dynamic is real, but it’s narrow. Mythos will be the concierge tier — the model you route to when the cost of being wrong exceeds the cost of being expensive. For everything else, there’s Opus. And Gemini. And Grok. And whatever specialist model handles your specific domain better than any of them.

The uncomfortable question for Anthropic: how big is the concierge market, really? How many queries per day are genuinely high-stakes enough to justify $150 per million output tokens? Because if the answer is “5% of all inference,” Mythos is a phenomenal product with a very small addressable market — and that changes how you model Anthropic’s revenue trajectory, especially as they approach a rumored October IPO.

🧠 The LeCun Bet: What If the Entire LLM Is the Wrong Power Plant?

Now zoom out further. Because the biggest story of the day wasn’t Mythos or Microsoft. It was the man who invented modern AI raising a billion dollars to bet against everything both companies are building.

Yann LeCun — Turing Award winner, creator of the neural networks that made all of this possible, a decade running AI research at Meta — quit and raised $1.03 billion for a company called AMI Labs. Largest seed round in European history. $3.5 billion valuation before generating a single dollar of revenue. Bezos wrote the check. So did Nvidia. Samsung. Toyota. Eric Schmidt. Tim Berners-Lee.

His thesis is blunt: every company spending billions on large language models is wasting their money.

LLMs predict the next word in a sequence. See “the cat sat on the” and predict “mat.” Scale that to trillions of words and you get something that sounds intelligent but doesn’t understand anything. It can’t reason from first principles. It can’t predict what happens when you push a glass off a table. A two-year-old can do that. GPT-5 cannot. That’s why AI hallucinates — it doesn’t have a model of how the world works. It just predicts words.

LeCun’s alternative is something called JEPA models that learn abstract representations of reality. Not language, but physics. An AI that could design a car, run a factory, operate a robot, or evaluate a medical diagnosis without hallucinating.

This matters for the intelligence grid because JEPA models aren’t competing with Mythos for the same queries. They’re competing for the queries LLMs can’t answer well — the physical reasoning, the robotics, the real-world prediction tasks. They’re a different power plant entirely, running on a different fuel.

And remember the Princeton research we cited yesterday: specialist models are already 10,000x more efficient than general-purpose reasoning models at their target tasks. If you’re building the intelligence grid, you don’t just need base load plants (cheap LLMs) and peaker plants (Mythos). You need solar panels — specialist models that handle specific domains at a fraction of the cost and energy. And you might eventually need nuclear — LeCun’s world models that handle the tasks no LLM can touch.

The grid gets bigger. Not smaller.

Two Turing Award winners — LeCun at AMI Labs, Fei-Fei Li at World Labs — raised $2 billion in three weeks betting against the LLM architecture. They could be wrong. The trillion-dollar LLM industry could keep printing. But these aren’t outsiders throwing rocks. LeCun literally built the foundations that ChatGPT runs on. When the architect says the building has a structural problem, you at least check the blueprints.

🔍 The Honest Pushback

We believe the intelligence grid thesis. But here’s where it could break.

Routing adds latency and cost. Every time you route a query to a second model for fact-checking — which is literally what Microsoft Critique does — you’re doubling your inference cost and adding response time. For a research report, that’s fine. For real-time applications — trading, driving, surgery — the routing overhead could be a dealbreaker. The fastest answer might still be one model, not two.

The grid might consolidate, not diversify. History says infrastructure markets start fragmented and end concentrated. The US had 4,000 electric utilities in 1930. It has about 200 meaningful ones today. The AI inference market could follow the same path — a few dominant model providers with the routing layer built in, not a thousand specialist models bidding into an open marketplace. Google is already building models AND routing infrastructure AND the world’s largest ad business to fund it all. That’s vertical integration, not an open grid.

LeCun has been saying this for years. He’s been critical of LLMs since before ChatGPT launched. His previous predictions about LLM limitations haven’t fully panned out — the models keep getting better in ways the skeptics didn’t expect. A billion-dollar bet doesn’t make the thesis right. It makes it expensive to be wrong.

The reverse auction is a coordination nightmare. Real electricity markets took decades to build, required massive regulation, and still produce market failures (see: Texas, 2021). An AI inference market would need standardized APIs, quality benchmarks, latency guarantees, and dispute resolution. Right now, every model has different capabilities, different context windows, different strengths. Bidding them against each other on price assumes they’re interchangeable at the task level. They’re not. Yet.

What This Means For You

The intelligence grid isn’t coming. It’s here. Microsoft just made the enterprise version visible, Apple is building the consumer version, and Perplexity proved the pure-routing model is worth $20 billion. The question is where you sit on the grid.

If you’re locked into a single model provider, you’re the company that buys all its electricity from one plant. That works until the plant goes down, raises prices, or gets outperformed. The multi-model future isn’t about disloyalty — it’s about optionality. Start experimenting with routing. Use Claude for writing. Use Gemini for research. Use a specialist model for your domain. The switching costs are lower than you think, and they’re getting lower every quarter.

Not every query deserves Mythos. When Anthropic releases it, the temptation will be to throw the biggest model at everything. Resist it. Run a cost-benefit analysis on your actual inference workload. We’d bet that 80-90% of your queries get equivalent results from a model that costs one-fifth as much. Route to Mythos for the 10% where accuracy is existential. Save the peaker plant for the peak.

Watch the specialist model space. The next wave of value creation in AI won’t be bigger general-purpose models. It will be smaller, cheaper, domain-specific models that outperform the frontier on narrow tasks. Healthcare. Legal. Financial analysis. Code review. If you’re in a vertical industry, the model that knows your domain better than Mythos — at a fraction of the cost — is probably being trained right now.

If you’re investing, the routing layer is the grid operator — and grid operators always win. In electricity, the generators compete on price. The grid operator takes a fee on every transaction and never loses money. In AI, the model companies are the generators. The routing layer — Apple, Google, Microsoft, Perplexity, and whoever builds the enterprise version — is the grid operator. The margin is in the orchestration. It always is.

Three Questions We Think You Should Be Asking Yourself

How many models are you currently using — and is anyone in your organization making a deliberate choice about which model handles which task, or is it just defaulting to whatever someone signed up for first? Microsoft, with all its resources, just admitted that using one model is suboptimal. If Satya Nadella is routing Claude and ChatGPT against each other inside Office, you should at least be asking whether your company’s blanket ChatGPT subscription is actually the right tool for every job it’s being used for.

If a real-time inference market existed tomorrow — where you could route every query to the best model at the best price — would your current AI spending go up or down? Most companies are overpaying for simple queries and underpaying for complex ones. They’re running Opus on tasks that Haiku could handle, and they’re not using Mythos-class models on the decisions that actually matter. An inference market would expose this inefficiency immediately. You don’t need the market to exist to run the audit.

What would it mean for your industry if the LLM architecture turned out to be a local maximum — good enough for text, but structurally unable to handle the physical-world reasoning that LeCun says matters more? You don’t have to believe LeCun is right. But Bezos, Nvidia, and Eric Schmidt do — enough to write billion-dollar checks. If world models eventually handle the tasks that LLMs hallucinate on — robotics, medical diagnosis, engineering design — the companies that bet everything on text-based AI will need to pivot. The ones that built on a routing layer won’t — they’ll just add a new power plant to the grid.

The test of a first-rate intelligence is the ability to hold two opposing ideas in mind at the same time and still retain the ability to function.”

— F. Scott Fitzgerald

The test of a first-rate routing layer is the ability to hold two opposing models in production at the same time and still deliver the right answer.

— The lesson of March 31, 2026

— Harry and Anthony

The Intelligence Grid

⚡ Welcome to the Inference Market

💰 The Veblen Threshold

🧠 The LeCun Bet: What If the Entire LLM Is the Wrong Power Plant?

🔍 The Honest Pushback

What This Means For You

Three Questions We Think You Should Be Asking Yourself

Sources

Past Briefings

Everyone’s arguing about who builds the best AI model. That’s the wrong race. The winner of the AI era will be whoever builds the best router.

AI’s Blind Geniuses

OpenAI Killed Sora 30 Minutes After a Disney Meeting. The Kill List Is the Strategy Now.