The Intelligence Grid
Microsoft built the enterprise routing layer. Apple is building the consumer one. The intelligence grid isn't a thesis anymore it's infrastructure.

THE NUMBER: 3 — the number of competing AI labs whose models Microsoft now orchestrates inside a single product. On Sunday, Satya Nadella introduced Critique — a multi-model deep research system built into Microsoft 365 Copilot. Claude generates a research report. Then ChatGPT fact-checks and improves it. Or vice versa. The company that owns 24% of OpenAI just publicly admitted that no single model is best at everything. That’s not a product update. That’s a confession — and a blueprint.
We wrote yesterday about Apple building the consumer routing layer for intelligence — Siri as a toll booth between 1.52 billion humans and every AI model on earth. We didn’t expect the enterprise version to arrive the next morning.
But here’s Microsoft, the company that bet $13 billion on OpenAI, routing queries to Claude inside its own flagship product. Not hiding it. Announcing it. Nadella stood on stage and said the words out loud: use multiple models together to generate optimal responses.
Ejaaz had the sharp observation: this is exactly what Karpathy described doing over the weekend — using multiple models to stress-test his own arguments. It’s becoming standard practice among the smartest practitioners. Microsoft just made it standard practice for 400 million Office users.
And you know what? Fair enough. OpenAI already did a deal with Amazon Web Services — right in front of their largest equity holder. The gentlemen’s agreements are over. Everyone routes to whoever’s best. That’s not betrayal. That’s procurement.
If Microsoft is intelligent about this and they usually are about money — they’ll use the OpenAI IPO as an exit ramp. Sell down gradually. Create liquidity for Microsoft shareholders. Transform OpenAI from a strategic bet into a supplier relationship. You don’t need to own the power plant. You need access to the grid.
⚡ Welcome to the Inference Market
Here’s the part nobody is writing about yet.
If routing is the architecture and models are commoditizing — and they are; anyone who uses these tools daily will tell you Claude writes better, Gemini does images better, Grok gives you the unhinged experience if that’s what you’re after — then the logical endpoint isn’t a routing layer. It’s a market.
Not a metaphorical market. A literal one. Real-time, bid-ask, every query.
Think about how electricity works. Over 2,000 power generators bid into the US grid every 15 minutes. The grid operator routes demand to the cheapest supplier that can meet the load. Peaker plants — expensive, powerful, rarely called — sit idle until demand spikes. Base load plants — cheap, always on — handle the 80%. Nobody owns every power plant. Nobody needs to. The grid makes it irrelevant.
The AI inference market is heading for exactly the same structure.
A query comes in. The routing layer evaluates complexity, stakes, and budget. Mythos bids $0.15 per thousand output tokens. Opus 4.6 bids $0.025. A specialist medical model bids $0.04 but with 95th percentile accuracy on orthopedic questions. The router picks based on what the task deserves.
That phrase — what the task deserves — is doing all the work here. And it changes the economics of AI more than any single model release ever will.
💰 The Veblen Threshold
This brings us to the most interesting story the AI newsletters couldn’t stop covering today: Claude Mythos.
Tomasz Tunguz — one of the sharpest venture minds in the business published a piece this morning called “Veblen & Jevon Walk Into a Data Center.” His argument: the era of cheaper tokens driving more consumption (Jevons Paradox) may be ending. Mythos is rumored at $15-25 per million input tokens and $75-150 per million output tokens. That’s 5-6x current Opus pricing. If the most powerful AI trades at a luxury premium, we’ve entered Veblen territory — where price signals capability and demand increases with cost.
It’s a clean thesis. But we think it’s slightly wrong.
Jevons isn’t ending. More people will use AI tomorrow than used it today, and the cost of basic inference keeps falling. What’s happening is that the market is stratifying — the same way the auto market stratified in the 1920s when GM realized you could sell a Chevrolet, a Buick, and a Cadillac off the same chassis. The chassis is the transformer architecture. The trim levels are the models. And the dealership is the routing layer that matches you to the right one.
Mythos won’t be a Veblen good for most queries. It will be a peaker plant.
Here’s why. If Opus 4.6 does 80% the job of Mythos at 20% of the cost — do you really care, except for the queries where the last 20% is worth 500% more money? In most workflows, you don’t. We use Gemini to score articles and Claude to write. Would we pay Mythos 6x the price to write a newsletter? No chance. The marginal improvement doesn’t justify the marginal cost.
But would you pay Mythos 6x if you were evaluating knee surgery options? If you were auditing production code for zero-day vulnerabilities — which the Mythos leak suggests it’s specifically built to find? If you were a law firm running a $200 million M&A deal through due diligence?
Absolutely. Because the stakes justify the tokens.
The Veblen dynamic is real, but it’s narrow. Mythos will be the concierge tier — the model you route to when the cost of being wrong exceeds the cost of being expensive. For everything else, there’s Opus. And Gemini. And Grok. And whatever specialist model handles your specific domain better than any of them.
The uncomfortable question for Anthropic: how big is the concierge market, really? How many queries per day are genuinely high-stakes enough to justify $150 per million output tokens? Because if the answer is “5% of all inference,” Mythos is a phenomenal product with a very small addressable market — and that changes how you model Anthropic’s revenue trajectory, especially as they approach a rumored October IPO.
🧠 The LeCun Bet: What If the Entire LLM Is the Wrong Power Plant?
Now zoom out further. Because the biggest story of the day wasn’t Mythos or Microsoft. It was the man who invented modern AI raising a billion dollars to bet against everything both companies are building.
Yann LeCun — Turing Award winner, creator of the neural networks that made all of this possible, a decade running AI research at Meta — quit and raised $1.03 billion for a company called AMI Labs. Largest seed round in European history. $3.5 billion valuation before generating a single dollar of revenue. Bezos wrote the check. So did Nvidia. Samsung. Toyota. Eric Schmidt. Tim Berners-Lee.
His thesis is blunt: every company spending billions on large language models is wasting their money.
LLMs predict the next word in a sequence. See “the cat sat on the” and predict “mat.” Scale that to trillions of words and you get something that sounds intelligent but doesn’t understand anything. It can’t reason from first principles. It can’t predict what happens when you push a glass off a table. A two-year-old can do that. GPT-5 cannot. That’s why AI hallucinates — it doesn’t have a model of how the world works. It just predicts words.
LeCun’s alternative is something called JEPA models that learn abstract representations of reality. Not language, but physics. An AI that could design a car, run a factory, operate a robot, or evaluate a medical diagnosis without hallucinating.
This matters for the intelligence grid because JEPA models aren’t competing with Mythos for the same queries. They’re competing for the queries LLMs can’t answer well — the physical reasoning, the robotics, the real-world prediction tasks. They’re a different power plant entirely, running on a different fuel.
And remember the Princeton research we cited yesterday: specialist models are already 10,000x more efficient than general-purpose reasoning models at their target tasks. If you’re building the intelligence grid, you don’t just need base load plants (cheap LLMs) and peaker plants (Mythos). You need solar panels — specialist models that handle specific domains at a fraction of the cost and energy. And you might eventually need nuclear — LeCun’s world models that handle the tasks no LLM can touch.
The grid gets bigger. Not smaller.
Two Turing Award winners — LeCun at AMI Labs, Fei-Fei Li at World Labs — raised $2 billion in three weeks betting against the LLM architecture. They could be wrong. The trillion-dollar LLM industry could keep printing. But these aren’t outsiders throwing rocks. LeCun literally built the foundations that ChatGPT runs on. When the architect says the building has a structural problem, you at least check the blueprints.
🔍 The Honest Pushback
We believe the intelligence grid thesis. But here’s where it could break.
Routing adds latency and cost. Every time you route a query to a second model for fact-checking — which is literally what Microsoft Critique does — you’re doubling your inference cost and adding response time. For a research report, that’s fine. For real-time applications — trading, driving, surgery — the routing overhead could be a dealbreaker. The fastest answer might still be one model, not two.
The grid might consolidate, not diversify. History says infrastructure markets start fragmented and end concentrated. The US had 4,000 electric utilities in 1930. It has about 200 meaningful ones today. The AI inference market could follow the same path — a few dominant model providers with the routing layer built in, not a thousand specialist models bidding into an open marketplace. Google is already building models AND routing infrastructure AND the world’s largest ad business to fund it all. That’s vertical integration, not an open grid.
LeCun has been saying this for years. He’s been critical of LLMs since before ChatGPT launched. His previous predictions about LLM limitations haven’t fully panned out — the models keep getting better in ways the skeptics didn’t expect. A billion-dollar bet doesn’t make the thesis right. It makes it expensive to be wrong.
The reverse auction is a coordination nightmare. Real electricity markets took decades to build, required massive regulation, and still produce market failures (see: Texas, 2021). An AI inference market would need standardized APIs, quality benchmarks, latency guarantees, and dispute resolution. Right now, every model has different capabilities, different context windows, different strengths. Bidding them against each other on price assumes they’re interchangeable at the task level. They’re not. Yet.
What This Means For You
The intelligence grid isn’t coming. It’s here. Microsoft just made the enterprise version visible, Apple is building the consumer version, and Perplexity proved the pure-routing model is worth $20 billion. The question is where you sit on the grid.
If you’re locked into a single model provider, you’re the company that buys all its electricity from one plant. That works until the plant goes down, raises prices, or gets outperformed. The multi-model future isn’t about disloyalty — it’s about optionality. Start experimenting with routing. Use Claude for writing. Use Gemini for research. Use a specialist model for your domain. The switching costs are lower than you think, and they’re getting lower every quarter.
Not every query deserves Mythos. When Anthropic releases it, the temptation will be to throw the biggest model at everything. Resist it. Run a cost-benefit analysis on your actual inference workload. We’d bet that 80-90% of your queries get equivalent results from a model that costs one-fifth as much. Route to Mythos for the 10% where accuracy is existential. Save the peaker plant for the peak.
Watch the specialist model space. The next wave of value creation in AI won’t be bigger general-purpose models. It will be smaller, cheaper, domain-specific models that outperform the frontier on narrow tasks. Healthcare. Legal. Financial analysis. Code review. If you’re in a vertical industry, the model that knows your domain better than Mythos — at a fraction of the cost — is probably being trained right now.
If you’re investing, the routing layer is the grid operator — and grid operators always win. In electricity, the generators compete on price. The grid operator takes a fee on every transaction and never loses money. In AI, the model companies are the generators. The routing layer — Apple, Google, Microsoft, Perplexity, and whoever builds the enterprise version — is the grid operator. The margin is in the orchestration. It always is.
Three Questions We Think You Should Be Asking Yourself
How many models are you currently using — and is anyone in your organization making a deliberate choice about which model handles which task, or is it just defaulting to whatever someone signed up for first? Microsoft, with all its resources, just admitted that using one model is suboptimal. If Satya Nadella is routing Claude and ChatGPT against each other inside Office, you should at least be asking whether your company’s blanket ChatGPT subscription is actually the right tool for every job it’s being used for.
If a real-time inference market existed tomorrow — where you could route every query to the best model at the best price — would your current AI spending go up or down? Most companies are overpaying for simple queries and underpaying for complex ones. They’re running Opus on tasks that Haiku could handle, and they’re not using Mythos-class models on the decisions that actually matter. An inference market would expose this inefficiency immediately. You don’t need the market to exist to run the audit.
What would it mean for your industry if the LLM architecture turned out to be a local maximum — good enough for text, but structurally unable to handle the physical-world reasoning that LeCun says matters more? You don’t have to believe LeCun is right. But Bezos, Nvidia, and Eric Schmidt do — enough to write billion-dollar checks. If world models eventually handle the tasks that LLMs hallucinate on — robotics, medical diagnosis, engineering design — the companies that bet everything on text-based AI will need to pivot. The ones that built on a routing layer won’t — they’ll just add a new power plant to the grid.
The test of a first-rate intelligence is the ability to hold two opposing ideas in mind at the same time and still retain the ability to function.”
— F. Scott Fitzgerald
The test of a first-rate routing layer is the ability to hold two opposing models in production at the same time and still deliver the right answer.
— The lesson of March 31, 2026
— Harry and Anthony
Sources
- Microsoft Critique announcement — Satya Nadella / X
- Ejaaz on Microsoft multi-model routing — X
- Ricardo on Yann LeCun and AMI Labs — X
- Tomasz Tunguz “Veblen & Jevon Walk Into a Data Center” — Theory Ventures
- Claude Mythos leak — Fortune
- Claude Mythos pricing speculation — The Decoder
- Anthropic $19B run rate — Bloomberg
- OpenAI $25B annualized revenue — The Information
- Princeton specialist model efficiency — Princeton University research
- Apple iOS 27 Siri Extensions — Bloomberg / MacRumors
- Perplexity multi-model routing — Aakash Gupta / X
- US electricity grid market structure — US Energy Information Administration
- WSJ Altman-Amodei feud investigation — Wall Street Journal
- Fei-Fei Li World Labs — World Labs
- “Elon Musk Is a Router” — CO/AI
- Gartner $15B agent management forecast — Shelly Palmer
Past Briefings
Everyone’s arguing about who builds the best AI model. That’s the wrong race. The winner of the AI era will be whoever builds the best router.
THE NUMBER: 1.52 billion — the number of active iPhones in the world right now. One in four smartphones on Earth. A 92% user retention rate. Nearly 70% of all global consumer app spending. And as of last week, every single one of them is about to become a switchboard for artificial intelligence. Apple doesn't need to build the best model. It just needs to decide which model to call — and that decision, made 1.52 billion times over, is worth more than any model ever will be. A few weeks ago, we published a piece called "Elon Musk Is...
Mar 26, 2026AI’s Blind Geniuses
Everyone's measuring AI adoption. Nobody's measuring AI results. If Jensen Huang and Alfred Lin can't agree on a scorecard, that tells you more about the state of AI than any benchmark can. THE NUMBER: 0.37% or 100% — the gap between the best score any AI achieved on ARC-AGI-3 (Gemini 3.1 Pro's 0.37%) and Jensen Huang's claim that we've already reached AGI. Even among the most credible voices in AI, nobody can agree on whether we're at the starting line or the finish line. That uncertainty isn't a bug. It's the operating environment. And it's exactly why the question of...
Mar 25, 2026OpenAI Killed Sora 30 Minutes After a Disney Meeting. The Kill List Is the Strategy Now.
$15M/day to run, $2.1M lifetime revenue. The pivot to Codex puts them behind Claude Code — in a market China is about to commoditize from below. THE NUMBER: $15 million / $2.1 million — the daily operating cost of Sora vs. its lifetime revenue. When a product costs 2,600x more to run per day than it has ever earned, killing it isn't a choice. It's arithmetic. The question is what that arithmetic tells you about everything else OpenAI is doing. OpenAI killed Sora this week. Not quietly — 30 minutes after a working session with Disney, whose $1 billion investment...