The LLM Market’s Next Phase

Animation by Grok

Three deals over the past fortnight tell you most of what you need to know about where the LLM market is heading.

On 6 May, Anthropic signed an agreement to use the entire compute capacity of xAI’s Colossus 1 data centre in Memphis, more than 300 megawatts and over 220,000 Nvidia GPUs. In February, Elon Musk had publicly accused Anthropic of hating Western civilization. Two weeks before the Anthropic deal, SpaceX paid up to US$60bn for an option over Cursor, the AI coding tool reportedly used across 70% of the Fortune 1000. Microsoft had examined the same Cursor acquisition and declined. And on 27 April, Microsoft and OpenAI restructured their partnership, ending Azure exclusivity and freeing OpenAI to ship across any cloud.

The contest is no longer about who has the best model. It is about who controls compute, distribution and economics.

Across the leading model players, the gap is closing. Context windows now run into the millions of tokens, while the idea of multimodality (incorporating text, video, images) is no longer a novelty. Reasoning is improving in roughly the same ways. What once marked the frontier is now table stakes.

As those differences compress, the fight is shifting away from raw capability and toward the economics of delivering intelligence at scale. Increasingly, it looks like cost, infrastructure and distribution will start to decide outcomes. That looks like a market that has ceased to be a race between models and started behaving more like a utility system.

It’s also a shift that’s visible across both the technical and financial dimensions of the market.

From models to utilities

On paper, the leading systems remain differentiated. Models developed by OpenAI, Google DeepMind and Anthropic each emphasise distinct strengths such as general-purpose reasoning, native multimodality and long-context reliability, respectively. But those capabilities are no longer unique. They are table stakes.

As the frontier converges, technical differentiation is shifting away from what models can do and toward how they do it, particularly at inference time. Increasingly, models allocate additional compute dynamically to complex tasks, trading speed for accuracy. This approach, visible in systems such as xAI’s Grok, reflects a broader move toward adaptive reasoning rather than simply larger training runs.

At the same time, efficiency is emerging as the dominant axis of competition.

New entrants, particularly DeepSeek, have demonstrated that high-performance models can be built and operated at significantly lower cost using techniques such as sparse attention and mixture-of-experts architectures. These approaches activate only a subset of model parameters for each task, reducing compute requirements without proportionally reducing performance. Alibaba’s Qwen models follow a similar path, combining large context windows with efficiency-focused optimisation.

The implication is straightforward: if comparable performance can be delivered at lower cost, pricing power shifts rapidly.

Performance is converging. Economics is diverging.

The compute bottleneck

That divergence is reshaping the competitive landscape.

Training and deploying frontier models now requires tens of billions of dollars in capital, placing enormous pressure on providers to monetise effectively. Revenue growth across the sector has been rapid, driven by enterprise adoption and consumer subscriptions. But those gains are offset by equally large expenditures on compute, infrastructure and ongoing model development.

The result is a market defined by a structural tension: growth is strong, but margins remain uncertain.

The Anthropic deal makes the bottleneck concrete. Anthropic now has compute relationships with Amazon (5GW), Google and Broadcom (5GW), Microsoft Azure (US$30bn), Nvidia, Fluidstack, and now SpaceX. None of the hyperscaler agreements deliver capacity until late 2026 or 2027. Anthropic needed compute now, and SpaceX had a 220,000-GPU facility in Tennessee that xAI’s own training operation no longer needed at full capacity.

Two things follow.

First, even xAI, with one of the most aggressive supercomputer builds in history, has surplus capacity it must monetise. Grok has not closed the usage gap with Claude or ChatGPT, and xAI reportedly lost US$6.4bn in 2025. The model layer is becoming a utility, and xAI is the clearest illustration. It is pivoting into cloud provision because the model alone is no longer enough.

Second, frontier labs will subordinate every other consideration, including political and personal hostility, to secure compute. Musk had attacked Anthropic publicly multiple times. None of that mattered when the deal made commercial sense for both sides.

Some players are better positioned than others to manage that tension. Integrated platforms such as Alphabet Inc. and Microsoft capture value across multiple layers of the stack, from infrastructure to applications. By embedding AI into existing ecosystems such as search, advertising, cloud computing and productivity software, they turn model capability into a distribution advantage. That matters because distribution is becoming the decisive layer.

The distribution layer war

As models converge, the question shifts from which system is best to which system is used.

Embedding AI into workflows, whether through enterprise software, developer tools or consumer platforms, creates a structural advantage that standalone model providers struggle to match. That explains why LLM providers have moved quickly to integrate content management capabilities into their models – effectively vapourising an entire subcategory of software that briefly flared into life in 2025, according to Scott Brinker, one of the world’s top marketing technology analysts.

Access to users, data and use cases becomes more valuable than marginal improvements in benchmark performance. Strategic partnerships reinforce this dynamic. Model developers are aligning with cloud providers, chip manufacturers and enterprise software firms to secure both compute capacity and market access. These alliances shape not only revenue flows but also technical direction, as systems are optimised for specific infrastructure and deployment environments.

Among the examples:

  • Microsoft was one of the first Big Techs to go down this road. It has invested billions in OpenAI, establishing itself as the company’s primary cloud partner. In return, OpenAI committed to running on Azure and embedding its models across Microsoft products, including Copilot. Even after the partnership was restructured in April 2026, giving OpenAI the freedom to serve customers across other clouds, the arrangement still links technical development and distribution: OpenAI’s models are optimised for Azure infrastructure, while Microsoft’s enterprise software ecosystem serves as a central route to market.

  • Anthropic has signed agreements with Google and Broadcom to secure multi-gigawatt TPU capacity, anchoring its future model development to Google’s chip architecture and networking stack. The arrangement means Claude models are trained and scaled within the constraints and advantages of that infrastructure, aligning their technical evolution with the systems that power them.

  • Last year Databricks signed separate multi-year agreements to integrate Anthropic’s Claude and Google’s Gemini models into its data platform. The move highlights how enterprise software vendors are emerging as aggregation layers, shaping which models organisations use and how those systems are configured and optimised for real-world data workflows.

  • Cursor is the most striking recent example. The AI coding tool reportedly generates around US$2bn in annualised revenue with roughly 50 employees. In April, SpaceX struck a deal with Cursor to develop “coding and knowledge work AI” and secured an option to acquire the company for US$60bn later this year. Microsoft had supposedly examined a Cursor acquisition and walked away. The price tells you something: at the application layer, the user interface to AI is more valuable than the model behind it.

There is a consistent pattern that emerges across all these deals: access to compute shapes how models are designed, whether around TPUs, GPUs or custom chips; distribution partnerships determine adoption through cloud platforms and enterprise software; and capital flows lock in long-term alignment between model builders and infrastructure providers. These are not loose commercial arrangements but structural alliances that influence how systems are built, where they run and who ultimately captures value.

The end of the benchmark era

For much of the past decade, progress in AI was measured by benchmark scores, abstract metrics of performance on standardised tasks. That framework is losing relevance.

In a market where most leading models can perform a wide range of tasks competently, incremental improvements in capability are less important than reliability, latency and cost per inference. Enterprises are not buying models for their theoretical performance. They are buying systems that deliver predictable results within operational constraints.

The competitive question is no longer “which model is best?” It is “which model is useful enough, cheap enough and embedded enough to matter?”

What this means

The LLM market is entering a new phase. Models are becoming infrastructure, intelligence is being pitched as utility, and value is shifting toward the systems that deliver that utility efficiently and at scale.

This has several immediate implications.

First, pricing pressure will intensify. As lower-cost models demonstrate competitive performance, expectations for cost per inference will fall, compressing margins across the sector. We are also moving – softly, softly – towards the moment when tokens are priced, rationally, rather than subsidised by investors. Firms that have moved rapidly into the deployment of agentic systems are starting to calculate what that means to their operating budgets. Humans may be irritation but at least their cost is predictable.

Second, distribution, as it has so often in the tech world, looks like it will determine winners. George Colony, the founder of global tech analysis firm Forrester, once famously observed that transformation waves are often ushered by new user interfaces, and in the AI world, that interface is a prompt. Providers that control the interface to users – through enterprise software, cloud platforms or consumer applications – will have a structural advantage over those that rely on standalone APIs.

Third, short of some extraordinary leap (a bit like the Deepseek wake up call last year), technical differentiation will persist, but in narrower domains. Specialisation, reliability and integration will matter more than general-purpose superiority.

Finally, the barriers to entry at the frontier will continue to rise. The capital required to train and operate cutting-edge models limits participation to a small number of well-funded players, even as lower-cost approaches expand access to capable systems. To give you a sense of the scale of investment, in February, Fortune reported a Moody’s Ratings estimate that the top five U.S. hyperscalers have accumulated US$662bn in future data centre lease commitments that have not yet commenced and do not appear as current liabilities, leaving them off-balance-sheet. Total undiscounted future lease commitments stood at US$969bn, or roughly 113% of the combined adjusted debt of the five companies.

The emerging pattern

The shift to an intelligence utility model has immediate capital-allocation implications.

At the base of the stack infrastructure – all those data centres, all that power and specialised hardware – will continue to absorb vast capital, but it will likely behave increasingly like a utility. It will be durable, constrained, and offer more predictable, but at lower-risk returns.

Higher up, model capability is becoming less defensible. As performance converges and lower-cost entrants reset expectations, pricing power is under pressure. Owning the “best model” might no longer be a reliable path to outperformance.

Value is moving up the stack.

The most attractive opportunities could emerge in areas like inference efficiency, orchestration, proprietary data and distribution. Efficiency is becoming critical as token consumption surges. Orchestration layers, which coordinate how models are used, represent a potential control point and are currently exercising the Valley’s best minds. Proprietary data flywheels reinforce performance over time. But the most durable economics sit in distribution – applications that embed AI into workflows, creating switching costs and recurring revenue.

The winners will not be the most capable models, but the systems that deliver intelligence that is predictably cheap, reliable and embedded.


Like this article? Join the thousands of tech founders, board members and investors who subscribe to our free monthly newsletter, Tech Round-Up. Sign up below!