Your Company Needs A Harness, Not An Upgraded chatbot

THE NUMBER: 152% — the quarter-over-quarter growth of agent tokens running through Salesforce’s Agentforce in Q1 (28.6 trillion tokens, 3.8 billion “agentic work units”). That is what real production AI looks like, on an actual income statement, when the buyer’s harness is built and tuned. The same week, under best-case assumptions, the Financial Times put hyperscaler AI ROI at minus nine percent for Microsoft, minus thirty-five at Oracle. Same models. Same vendors. The variable is the harness — and exactly who built it.

Every Formula One car on the grid is built to the same rules. A thousand pages of regulation that gets thicker every season — aero, weight, fuel flow, power unit displacement, the angle of the front wing endplate measured in millimeters. The cars finish within tenths of a second of each other, race after race, by design. What separates a podium from the midfield isn’t the chassis. It’s the driver, the seat fitted to that specific driver’s body, the steering ratio dialed in for that driver’s reflexes, the brake bias tuned to that driver’s particular instincts. The harness is custom. The car is commodity. The teams that win are the ones whose driver, engineer, and machine were built around each other. That used to be a sport you watched on Sunday. This week it became the org chart on Monday.

Two numbers landed in the same news cycle, and they read like they came from different planets.

Salesforce, Q1: $11.1 billion in revenue, with Agentforce processing 28.6 trillion agent tokens, up 152% quarter over quarter, and 3.8 billion discrete “agentic work units,” up 111%. That is not a press release. That is agents doing actual work, at industrial scale, billed and collected, on the income statement.

The Financial Times, same week, picked up on X by an account called Yoshik and run to two hundred and thirty-eight thousand views in a day: AI ROI at the hyperscalers, under best-case assumptions, comes in negative across the board. Microsoft at minus nine. Google at minus fifteen. Meta at minus twenty-eight. Oracle at minus thirty-five. Amazon barely positive. The cleanest piece of bear data on the page this quarter.

Same technology. Same models. Same vendor catalog. Opposite outcomes. The variable is the harness — and what nobody is saying out loud is that there are two harnesses involved in every AI deployment. The one the lab built. And the one the customer didn’t. The labs shipped theirs from day one. The customer is staring at a six-figure invoice wondering where the value went. That asymmetry is the entire story of the spring of 2026. Anthropic’s new model doesn’t change it. Apple’s chatbot won’t change it. The conversation has to move one layer up — to the part of the system you actually own.

The benchmark race is a cover story. The labs are converging because they have to.

The frontier labs are arriving at roughly the same capability for the same reason F1 cars finish within tenths of each other. The rules of the road force it. Capital intensity, talent flow, the gravitational pull of margin in a market with three big buyers, the speed at which everyone copies everyone else’s good idea. Creative destruction punishes the lab that falls behind, so nobody falls behind for long. Anthropic shipped Opus 4.8 yesterday. Dan Shipper at Every said in his vibe check that they should have called it Opus 5. He’s not wrong. The benchmark went up a hundred and thirty-seven points on a test almost nobody outside the office cares about. The same price. New thinking-effort selector. It is the new state of the art.

It is also not the story.

Here is the math nobody wants to put on a slide. The average enterprise user touches maybe ten percent of any frontier model’s actual surface area. The other ninety percent — the rare capabilities, the deep modes, the long-context tricks, the structured tool use that took the lab a year to build — sits there unused, because the people typing into the box don’t know it’s there and the workflow they use it inside doesn’t ask for it. An eight percent capability gain through that ten percent window is, by any honest arithmetic, about 0.8% in real output for the average shop. Call it one percent if you want to be generous. The CFO will not notice. The CIO will not notice. The line manager will not notice. The benchmark moved. The P&L did not.

The bullish counter-read landed this afternoon. Aligned News framed Opus 4.8 as Anthropic’s direct answer to the enterprise AI ROI question — sharper judgment to reduce confident mistakes, more honesty about progress to reduce mid-task failures, longer independent work to reduce the restart tax, the same price to dodge the revalidation tax, and a new Thinking effort selector to give you a cost-performance dial at the task level. They are not wrong about the features. They are wrong about which side of the agentic transaction those features live on. Every one of those improvements is shipped on the lab’s side of the harness. The Thinking effort selector is a new knob on the lab’s dashboard. You still have to know which setting to pick, for which task, with which operator behind it, and the buyer-side judgment you need to choose intelligently does not come in the box.

The benchmark race is a media event. It exists to fill copy in the same forty newsletters that will lead with Opus 4.8 today and Apple’s chatbot the week after. It is real. It is also a cover story. The work that moves a number on an income statement happens a layer above the model — at the harness, on the side you control, tuned to your specific shape. What that means for you: stop reading the model releases as if they were strategy. They are inputs to your harness. They are not the harness.

The lab shipped its harness from day one. You didn’t ship yours.

OpenAI, Anthropic, and Google did not ship a chatbot. They shipped a complete agentic stack on their side of every agentic transaction. Identity, authorization, billing, retry logic, tool registries, sandboxing, observability of the run, MCP as connective tissue, payment rails that route a million tiny API calls into one consolidated invoice at the end of the month. Tomasz Tunguz catalogued the seven components on his Theory Ventures blog Tuesday — context and memory, tools and action, orchestration and loop, state and persistence, sandbox and compute, observability and governance, cost and workflow optimization. Every one of them, professionally built, in production, monetized, and pointed at the lab’s gross margin.

So when an agent runs, it does not get tired. It does not go home. It does not stop until the budget does, and the budget is yours, not the lab’s. Every call has a meter on it that only one side of the agentic transaction can read in real time. The reports you keep seeing of forty-thousand-a-month budgets ballooning into four-hundred-thousand-dollar invoices are not a story about a model that broke. They are a story about a meter that only one party was set up to see.

Dan Martell, a SaaS operator who has run more of these companies than most of his Twitter feed, said it out loud this week. Companies cancelling their Claude subscriptions are not actually responding to AI being too expensive. They are responding to having measured the wrong number. Cost per token is the bill. Revenue per head is the business. If your engineer spends two thousand dollars a month on Claude Code and ships what three engineers shipped last year, that is not a cost problem. That is the best deal in the history of corporate America. The mistake is letting the tool run without tying its spend to its output, which is exactly the same mistake every controller already knows how to catch on a payroll line. AI is not a different category of expense. It is an expense that needs the same KPI rigor every other expense gets, and that rigor lives in the buyer’s harness, not the lab’s.

The lab has the throttle. The customer doesn’t have the brake. That is not a moral failure of the labs. They built what they sell. It is the natural physics of an asymmetric harness, and the longer the asymmetry runs, the more the income statement looks like the FT chart.

The buyer side of every AI deployment, in every company under a hundred million dollars in revenue, has been an afterthought. It will not be an afterthought much longer. The signal: the first item on the agenda at every board meeting for the rest of this year is which of Tunguz’s seven components your company actually owns, and which of them you have rented from a vendor whose meter you cannot read.

Salesforce proved the buyer side can be built. Verstappen proves it has to be yours.

Now look at the other column of the same income statement. Salesforce spent four years building Agentforce, Data Cloud, and the Einstein Trust Layer in parallel. They built the customer’s harness as a product, and sold it to the customer prepackaged. So a Salesforce customer turning Agentforce on inherits governance, identity, observability, cost controls, audit trails — Tunguz’s seven, already in the box, with a Salesforce logo on the side. 28.6 trillion tokens move because the harness sits on both sides — theirs and yours. The income statement matches the marketing because the customer can actually read the meter and shape the work it does.

Here is the part the consulting decks will miss. Even Salesforce’s harness is a starter kit. It is not the finished article for any specific company. A harness off the shelf is a harness fitted to nobody, and a harness fitted to nobody finishes nowhere in particular.

Think about Max Verstappen and the Red Bull. Five drivers have sat in the second Red Bull seat alongside him during the current car era — Pierre Gasly, Alex Albon, Sergio Pérez, Liam Lawson, Yuki Tsunoda. Same chassis, same engine, same engineers, same garage as Max. Their combined qualifying record against him: twelve wins, one hundred and thirty-eight losses. Ninety-two percent Verstappen. That car was not the magic. The fit was. Same machine, six different drivers, only one of them could extract anything close to its ceiling. And to make the point harder still: last season the Red Bull was widely understood to be technically inferior to the McLaren and the Mercedes, and Verstappen still drove it to the final race of the year with the title mathematically alive. The chassis was behind. The fit overrode it.

Now look at where the championship sits today. The standings read Mercedes, Mercedes, Ferrari, Ferrari, and the Red Bull is fighting for crumbs. Kimi Antonelli is not a better driver than Verstappen. Nobody on the grid is. And yet there he is, sitting on a podium. What changed isn’t the talent and it isn’t the fit between Max and his old harness, which is still the best pairing on the grid. What changed is the chassis underneath both of them. New regulations dropped this season. A brand new car. Mercedes and Ferrari got the jump building to the new rules; Red Bull and McLaren didn’t. The current racing style — slower corners, more energy recovery, less of the late-braking aggression Max built his career on — doesn’t fit Max either. The team that read the new rule book fastest is now winning races nobody thought they would.

That is the buyer-side harness, exactly, with the extra lesson the Verstappen story refuses to let you skip. Capability is now a commodity, the same way an F1 chassis is. A pre-built harness — even Salesforce’s, even whatever Microsoft and Google ship at re:Invent and Ignite over the next six months — is a chassis. It is not the finished article. The work that actually moves a P&L is the tuning. The data architecture mapped to your business model, not somebody else’s. The agent ownership written against your org chart, not the generic template. The budget and approval rules calibrated to your risk tolerance. The workflows shaped around the specific operators in your specific chairs. You can’t buy any of that. You have to build it, and you have to build it for you. And here is the part the Verstappen story will not let you off the hook on: when the chassis itself changes underneath you, the team that gets the jump on the new car wins. Standing pat with last year’s tune is how a four-time world champion ends up mid-grid. The action item: the question isn’t whether your company has a harness. It’s whether the harness it has was built for you, and whether you have a plan to re-tune it the next time the chassis underneath you changes.

Three voices, three chairs. We are filling the one that was missing.

The most thoughtful people on the cycle are arriving at the same conclusion this week from three different chairs.

Tunguz, two days running on his Theory Ventures blog. Tuesday: Software After AI — a piece that opens with the line that became this week’s vocabulary, and that argued the end of the software era is the beginning of the harness era. He walked through the seven components and how each one separates a demo from a production agent. Wednesday: Security in the Age of AI Agents with Jonathan Jaffe. A two-part essay on the engineer’s blueprint for the buyer-side harness, component by component, with the parts of the stack that need controls before any of it goes into production. Tunguz is technically a venture capitalist; the framing is engineering. He is writing the build spec. Cite him by name and chair: this is what the engineer is supposed to do.

Nate Jones at Substack, the same morning. A piece on agent product analytics that opens with PocketOS — a software company whose Cursor agent deleted the production database and the volume-level backups in nine seconds, late April 2026, while the product dashboard stayed green. Founder Jer Crane watched it happen. The session metrics on the screen said everything was fine. The session metrics were measuring a session. The thing that mattered happened inside the agent’s run, and the run was invisible. Nate is a hardcore engineer writing the post-mortem. He is telling product teams what to instrument when sessions are no longer the unit of work. Cite him by name and chair: this is what the engineer sees after the failure.

a16z, the same morning, on the firm’s Substack. Narrative Violation: In B2B customer support, AI is a copilot, not a replacement — with first-party data from Pylon. The piece argues that the function-level reality of agentic customer support is that the AI assists the human, the human resolves the customer, and the workflow tuned around that pairing is where the value sits. a16z is a venture firm run by managers and engineers; the framing is the function. They are writing the operator’s playbook for one department, with data behind it. Cite them by name and chair: this is what the function lead is supposed to do.

Three serious chairs — the engineer’s build spec, the engineer’s post-mortem, the function lead’s data — all naming the same animal in the same forty-eight hours. The signal: our chair is the one that has been missing. The CEO and the directors, the board pack that reads in sixty minutes, the company-management view that decides which of Tunguz’s seven components your company is going to fund, which functions are going to copilot the way a16z described, and which dashboards are going to be replaced with something that can see what PocketOS lost in nine seconds. Same conclusion as the engineers and the VCs, reached from the seat at the head of the table.

The harness is load-bearing. Each beam supports the next.

This is the part the consulting deck reading of the week tries to make easy. The temptation is to treat the seven components, or the eight questions, or the four-step framework, as a checklist. We have five out of eight. We’re fine. That is not how a harness works. It is how a checklist works. The harness is load-bearing structure, and the beams support each other.

Data without governance leaks. Governance without observability is theater. Observability without a workforce architecture that owns the agent is a dashboard nobody reads. A workforce architecture without a budget is fiction the controller signs off on. A budget without cost optimization is a problem the CFO defers to next quarter. Cost optimization without a harness tuned to the actual operators is the F1 car set up for nobody — fast on the spec sheet, finishes nowhere on Sunday.

Remove one beam and the structure does not stand at seven-eighths. The other beams reshape around the gap, either over-supporting and cracking, or under-supporting and collapsing into the basement. The PocketOS database, the hyperscalers’ negative ROI, the CIO’s six-figure surprise invoice — every one of those is what happens when a single beam is missing and the rest of the structure tries to compensate for it without knowing it has to. The eight questions below are not eight independent items. They are a single integrated stress test of one structure. The board that can pass the test is the board that built the structure as a structure, not as a series of point solutions bought one quarter at a time.

Eight Questions Every Leadership Team Should Be Able to Answer Right Now

These are the questions the board should be able to answer cleanly, on a Friday afternoon, without a six-week diligence sprint to find the answer. They are the structured version of the audit we run at Outsider Labs. They appear here for the same reason they appear on the site: if you can answer all eight cleanly, you do not need us. If you can’t, we need to talk.

01. Data. Can an agent actually do useful work against your data today, or does it choke on the silos, the schemas, and the tribal knowledge that lives only in your team’s heads?

02. Governance. For every agent you have deployed: where does it run, what can it know, who is it acting for, what can it change, what can it spend, how do you know what it did, and how do you stop it?

03. Ownership. Who is the named human owner — accountable for what it does, what it spends, what it breaks — for every agent in your company? And which roles on your org chart are most exposed in the next eighteen months?

04. Customers. If AI compresses your customers’ industries in the next twenty-four months, what does that do to your revenue book — and which three accounts should you be de-risking right now?

05. Pricing. When your customers’ willingness-to-pay drops forty to seventy percent because their own AI extracts what you used to deliver, which of your product lines survives, and at what price?

06. Sellability. If you wanted to transact in twenty-four to thirty-six months, where on the AI-readiness premium-to-discount spectrum would you trade — and what are the three sharpest moves to migrate up before exit?

07. Margin. Function by function and counterparty by counterparty: where is AI handing you throughput, and which of the vendors you pay every quarter is no longer worth the check?

08. Workforce. Of the next twelve months of hires: which to make, which to pause, and what new role are you ignoring that nobody else is creating yet?

What This Means For You

Capability is a commodity now. The lab side of the harness is finished. The side that matters — yours — has barely been started, and it cannot be bought, only built.

Stop reading the model releases as if they were strategy. Opus 4.8 is real. Apple’s June chatbot will be real. They are inputs to your harness. None of them is the harness, and none of them will change a P&L number on its own.

Treat the buyer-side harness as a structure, not a checklist. The eight questions above are not a list to grade against. They are a single integrated stress test. The board that passes is the board that built the structure as a structure — not the one that bought seven good point solutions and let the eighth go.

Find the operator. Then tune the harness to them. The audit and the column from yesterday come together here. The rarest person in your company is the operator who knows what to ask the machine for. The most important thing they do after that is tune the system around themselves. Salesforce gave you the chassis. Verstappen reminded you the seat has to fit.

The engine has more horsepower now. The driver is the same person you were last week. The chassis is the same one everyone else got. Build the cockpit. Tune it to your business and the operators inside it.

The best driver/harness combination wins.

The end of the software era is the beginning of the harness era.”
— Tomasz Tunguz, Software After AI (May 27, 2026)

— Harry and Anthony

Your Company Needs A Harness, Not An Upgraded chatbot

The benchmark race is a cover story. The labs are converging because they have to.

The lab shipped its harness from day one. You didn’t ship yours.

Salesforce proved the buyer side can be built. Verstappen proves it has to be yours.

Three voices, three chairs. We are filling the one that was missing.

The harness is load-bearing. Each beam supports the next.

Eight Questions Every Leadership Team Should Be Able to Answer Right Now

What This Means For You

Sources

Past Briefings

I Know Kung Fu and AI

Magnifica Humanitas

Mr. Irrelevant