The Big Data Center Squeeze: Getting More Compute From Existing Infrastructure
The land grab is on. Everyone is trying to build data centers right now, from hyperscalers to neoclouds (like Nscale, which just raised a massive $2 billion round), sovereign wealth funds, and private equity firms. Capex commitments for AI infrastructure are in the hundreds of billions. Capital is pouring in from every direction, yet the most valuable "new capacity" over the next several years might not come from pouring concrete at all.
A series of constraints are making it increasingly hard for the ongoing buildout of physical infrastructure to keep up with AI demand. Even if you manage to navigate the hardware shortage and get the GPUs and memory you need, you could very well be stymied by insufficient energy supply. Regulatory challenges are also delaying successful new data center builds. Together, these constraints means that just building more - whether in Texas or even space, whether NVIDIA or MatX chips - is becoming harder and harder.
Rather than waiting for new fabs, new substations, and new campuses, it’ll be increasingly important to get more useful compute out of the infrastructure that already exists. We see opportunity here for new software that improves (a) utilization and throughput (tokens per GPU) and (b) facility efficiency and power delivery (tokens per watt, GPUs per megawatt).
Inference demand and data center capital expenditures are surging
Starting with what’s changed on the demand side: Inference workloads — the compute required to actually run AI models and not just train them — have gone from roughly a third of all AI compute in 2023 to half in 2025 and up to two thirds in 2026.
This matters because inference demand is structurally different from training. Training is periodic and centralized: you do a big run, then you’re done (until the next one, of course). Inference is continuous, grows with every new user and application, and increasingly needs to be geographically distributed to reduce latency. Every model deployment creates sustained demand that compounds with adoption.
The implication for infrastructure is significant. Training workloads can be concentrated in massive, remote clusters. Inference however needs to be everywhere, running all the time, and scaling with user adoption curves that are still in early innings. This creates a different type of pressure on data centers, one that’s more distributed and more continuous.
On the capex side, hyperscaler spend is massive and growing. Amazon’s committed to $200B in 2026 capex, Google around $180B, Meta $135B, and Microsoft $120B. In all cases,these numbers represent nearly 100% of their operating cashflow, which means internal cash generation can’t fund their full buildout. Amazon will see negative free cash flow in 2026, and Alphabet and Meta’s FCF will both drop ~90%. So, to find the funds, they’ve turned to borrowing. Hyperscalers raised $120B in new debt ($90B in the final months alone) in 2025, 4x+ the average issuance over the last five years. Alphabet even issued a 100-year bond.
These are companies that, until very recently, had more cash than they knew what to do with. The extent of the leverage they’re putting behind funding data centers tells you something about the sheer scale of these projects – and the gap between their ambition and available capacity.
Hardware is a bottleneck for surging demand
On the supply side, you can see the hardware industry investing in more manufacturing capacity – but still not keeping pace. TSMC's own capex has followed a V-shaped trajectory, with $36 billion in 2022,down to ~$30 billion across 2023 and 2024, only to increase to $41 billion in 2025 and again to $55 billion for 2026.
But each new fab construction takes 2–3 years in Taiwan and 4 years overseas before producing a single wafer. That means the $55B TSMC spends this year won't translate to meaningful new capacity until about 2028.
There is also a binding constraint with assembling raw chips. Fundamentally, AI chips consist of multiple components wired together using a process called CoWoS (chip-on-wafer-on-substrate). NVIDIA has locked up over 70% of TSMC's CoWoS-L capacity through 2026. The other limiting factor is HBM (high-bandwidth memory), or memory chips that are stacked on top of each other to move data to the GPU much faster than traditional memory can. There are three HBM suppliers (SK Hynix, Samsung, and Micron), and all have confirmed their entire 2026 output is sold.
Even if TSMC cranks out more fabs, the finished GPUs can’t ship until packaging and memory also catch up.
We can’t produce enough energy, either
US data centers consumed roughly 180 TWh in 2024 (about 4.4% of total electricity) and are on track to hit 430 TWh (~ 2.5x) by 2030. This would exceed all energy-intensive US manufacturing combined.
What really puts it into perspective for us: a single gigawatt-scale data center campus consumes as much electricity as a city of roughly 800,000 people. That’s just about the population of San Francisco…
Regulatory bottlenecks are stalling data center build-outs
$100B+ in data center projects have been blocked or delayed across the US. More than 300 state-level data center bills were filed in the first six weeks of 2026.
Some examples:
- In Virginia, the largest data center market, Loudoun County just reclassified new builds from by-right to requiring public hearings. The latest denial was in Henrico County, where the board of zoning unanimously rejected data center company Centra Logistics’s plan to build on 200 acres.
- In New Jersey, New Brunswick's city council unanimously canceled a planned AI data center in February and replaced it with a park, after organized opposition from environmental justice groups.
- The Wisconsin State Assembly passed AB 840 in January 2026, which says that utilities cannot subsidize data center infrastructure, and mandatory closed-loop cooling and water reporting is required.
- Minnesota is imposing $2 to 5 million annual fees on data center facilities based on their peak electricity demand, while also mandating they use 65% carbon-free energy, and instating prevailing wage requirements.
These constraints are pushing interest toward alternative locations. Saudi Arabia's PIF-backed Humain is planning to build out 6.6 GW of data center capacity by 2034. Space-based data centers could be cost-effective, with Starcloud launching a satellite with NVIDIA H100 in November after attracting significant venture funding.
These are early signals, but they underscore how binding all of these constraints have become.
Two strategies to extract more tokens from existing infrastructure
So if new data centers are hard to build, slow to power, and increasingly unwelcome, what do you do? You optimize. You get more useful work out of every GPU and every megawatt that's already deployed.
There are two categories that we think will have the most impact:
- Optimize inference: At a high level, there are a number of neoclouds and inference providers that provide far better developer experiences and more optimal performance than the hyperscalers. Across domains, there are numerous ways to extract more performance at this part of the stack. There’s more to say here that’s beyond the scope of this post, but we’re continuing to track here closely.
- Data center management: Goal here is to get more chips per megawatt online through better cooling, power, and facility optimization.
Following thoughts are focused here: Data center management software addresses three interrelated problems: cooling optimization (largest energy sink), grid/power management (binding constraint on new capacity), and facility monitoring for next-gen GPU infrastructure (greenfield replacement cycle). We’ve categorized them as such but they’re all really different slices of broader datacenter management. Companies also don’t often squarely fit in just one category.
- Cooling: Cooling consumes ~40% of a data center’s non-IT energy. Legacy building management systems (BMS) cannot dynamically optimize cooling in real time. They assume static environments with predictable, lower-density loads. The new approach here is AI-driven control systems that layer on top of existing BMS infrastructure, ingest sensor data, and continuously adjust chillers, CDUs, and air handlers in response to real-time conditions.
Google’s DeepMind pioneered this approach internally, achieving significant energy savings through reinforcement learning. Issue is they never really commercialized this tech for external use, leaving the door open for players like Phaidra, Etalytics, and FLUIX AI.
- Grid flexibility & power management: Getting connected to the grid is the single longest lead-time item in building a data center, and can take 4–8 years. Utilities see data centers as grid liabilities and inflexible baseloads. If a facility requests 500 MW, the grid provisions for 500 MW at all times. The rigidity creates large mismatches, as data center compute demand is actually high variable but the infrastructure is set up as if it’s always running at peak.
The new category here is software that makes data centers “grid-friendly” by dynamically shifting AI workloads in response to grid conditions. That means ramping down during stress on the grid and ramping up during periods of surplus, which then turns data centers from grid liabilities into more flexible assets. This can directly accelerate permitting and interconnection approval. Some interesting players here include Emerald AI, Hammerhead AI, Mercury, Encentive, Lucend, and Pebble.
- Modern DCIM & facility monitoring. Many new data center operators don’t have software to monitor their facilities at all. Legacy DCIM systems (Schneider, Eaton, and Vertiv) were often built for an era of lower density and slower change. New entrants include Aravolta, Entangl, and Hammerhead.
Beyond software: Robotics and better hardware
Data center robotics for maintenance, monitoring, and physical infrastructure management is emerging as a real category. Data centers are increasingly built in remote locations where skilled technicians are scarce. Physical operations remain manual and labor-intensive. As facilities scale to gigawatt campuses with thousands of racks, the gap between what needs to get done and the available workforce widens.
Some players of note here: Watney Robotics, which focuses on data center setup and deployment, while Boost targets ongoing maintenance and monitoring. Bedrock Robotics is tackling accelerating construction of data centers by turning excavators and bulldozers into autonomous machines. Taken together, robotics can compress timelines across the entire data center lifecycle, from breaking ground to racking servers to keeping them running. While this is admittedly quite far down the line, the ability to run these facilities without human operators would mean you can remove some constraints, like temperature and lighting, and optimize infrastructure purely for hardware efficiency.
So far we’ve discussed the many physical constraints holding back AI infrastructure. Those headwinds are very real and aren’t going away anytime soon, but there remains some really compelling tech in the hardware layer itself.
Some examples include Nexthop, which is building custom network switches for hyperscalers, tailored to each operator’s GPU, storage, and rack configurations to optimize performance and efficiency. Claros, a power-management platform, combines hardware and software that delivers power more directly to processors and cuts out unnecessary conversions so more watts end up doing useful compute. Finally, Orbital Materials built a physics AI model for simulating advanced materials, which they’ve applied to build better cooling hardware in partnership with NVIDIA and AWS.
Let’s chat
We’re really excited about companies tackling data center management as DC providers try to squeeze more tokens out of existing infrastructure. If you’re building in this space, we’d love to hear from you. Email us at siddharth@scalevp.com and aurelia@scalevp.com.
News from the Scale portfolio and firm


