Infrastructure as a Constraint: Power, Cooling, and the Physical Limits of AI Scaling

Introduction

For a decade, "scaling AI" mostly meant "buy more GPUs." You could squint at a scaling law, convince yourself that another order of magnitude of compute would buy you another chunk of capability, and assume the cloud would take care of the details. That era is ending. The constraint is no longer "Can you afford the chips?" It is "Can you power them, cool them, and physically put them somewhere on Earth without melting a substation or starting a regulatory war?" If you build or bet on large-scale AI and you're not thinking in megawatts, megawatt-hours, water usage, and substation lead times, you're living in a story that stopped matching reality two or three generations of hardware ago. The physics bill is due. The question now is not "How big can a model get?" It is "How much physical infrastructure are you willing and able to build to sustain the appetite of these models, and how does that cap what is economically rational?" Let's walk through what that actually means.

From model sizes to megawatts

The way AI people talk about scale is still mostly abstract:

Parameters.
Tokens.
FLOPs.
Context length.

Operators of data centers talk about something else:

Kilowatts per rack.
Megawatts per hall.
Power Usage Effectiveness (PUE).
Water-per-megawatt.

The delta between those two languages is where most of the current fantasy lives. A modern accelerator server easily pulls on the order of a few kilowatts. Pack enough of them into a rack and you're looking at tens of kilowatts for that rack alone. Multiply by rows and halls and "just one more training training models without centralizing data cluster" quietly looks like "please add tens of megawatts of continuous load ai tools that help people think to this region." Power companies, municipalities, and grid operators notice that. AI training workloads aren't web servers. They are intense, sustained, high duty-cycle-system loads. When you light up a large training run for weeks, the grid sees something closer to an industrial plant than a SaaS product. The scale numbers people throw around casually—hundreds of thousands of accelerators, multiple concurrent large runs—imply energy draw that has to be:

Generated somewhere.
Moved over actual transmission lines.
Transformed down through actual substations.

Every one of those steps has build times and bottlenecks that ignore hype cycles completely.

Power as the first-class bottleneck

We're used to thinking of "capacity" in terms of GPUs available in a region. The real capacity is megawatts. You cannot spin that up with an API call. You need:

A suitable site with room for substation equipment and cooling plant.
High-voltage transmission lines within reach.
A utility that is willing and able to deliver the loads you're talking about.
Permits, environmental impact assessments, and community buy-in.

Grid upgrades are measured in years, not quarters. Transformers and high-voltage gear are not impulse buys; they have global supply chains and backlogs. So when someone says "we're going to 10x compute again," you should translate that into at least three questions:

How many more megawatts is that, assuming realistic efficiency gains?
Where will that power come from, physically and contractually?
Who else is competing for those same electrons in that region?

Renewables complicate this further. If you want to maintain a low-carbon story, you're not just "buying power." You're signing long-term PPAs, backing new generation, or playing games with certificates that regulators and the public are getting better at seeing through. At some point, the story "AI will save the world" doesn't play well next to "AI needed a small city's worth of power, so we turned the gas plant back on." Energy is fungible on paper. On the ground, somebody's load flexes or somebody's emissions go up.

Cooling and the end of air as the default

Even if you can get power to your data center, you still have to get heat out. Rack densities for accelerator-heavy clusters have blown past the comfortable range of traditional air cooling. Air can only carry so much heat per cubic meter and per degree of temperature rise. At some density, you reach absurdities:

Massive rooms filled with roaring air handlers.
Hot aisles that are dangerously hot.
Noise levels that are a workplace-safety problem.

Liquid cooling is not a luxury anymore. It is a physical necessity for serious AI clusters. There are several flavors:

Cold plate or direct-to-chip.
Rear-door heat exchangers.
Full immersion.

Each one is an infrastructure commitment. You're designing plumbing, not just racking servers. You're committing to:

Coolant chemistry and maintenance regimes.
Risk profiles around leaks and failures.
Facilities staff who are part data center ops, part industrial plant engineers.

On top of that, if your cooling system is ultimately rejecting heat to the atmosphere through evaporative towers, there is water usage to account for. In some regions, that is politically toxic. The unit economics of "one more training cluster" now include line items for chillers, towers, pumps, pipes, and water rights. Ignore that and you will be at the mercy of whichever part of the system breaks first.

Physical footprint and siting politics

Training clusters don't exist in empty space. They sit in or near communities that have their own priorities. Local authorities and residents care about:

Noise from cooling plant and generators.
Truck traffic during construction.
Visual models patterns tropes and backlash impact and land use.
Competition for water resources.
Fear, rational or not, of "mysterious AI boxes" sucking up power while housing is expensive and infrastructure is strained.

If you want to drop hundreds of megawatts of data center capacity into a region that hasn't historically hosted that kind of load, you are negotiating with:

Planning boards.
Environmental groups.
Labor and construction constraints.
Sometimes national-level regulators.

This is why the biggest players cluster capacity in a few regions that already made those trade-offs years ago: they're de facto special economic zones for compute. But concentrating that much critical infrastructure creates its own risk:

Geopolitical.
Physical security.
Single points of failure.

You cannot move petawatt-hours of work ai how teams actually repartition tasks between humans and models per year without somebody eventually asking who decided that this valley or that industrial park should become a core piece of global AI infrastructure.

Training versus inference: different shapes of pain

Training and inference stress infrastructure differently.

Training:

Large, spiky jobs.
High duty cycle for weeks or months.
Often schedulable, but under pressure from research timelines.

Inference:

More steady-state, tied directly to user traffic.
Latency sensitive.
Harder to batch or move in time.

From a power and cooling perspective, the obvious strategy is:

Smooth training loads into off-peak hours where possible.
Place inference closer to users and edge sites, but with tighter per-rack constraints.

In practice, the line blurs. Some "training" is continual fine-tuning on fresh data.
Some "inference" is long-running agentic workflows that look like small training bursts.

If you're responsible for the physical plant, you need:

Accurate models of load over time.
Policies about what can be preempted and what cannot.
Feedback loops between schedulers and facility systems.

The naive attitude—"researchers get what they want when they want it, product must never see a 500"—is how you end up overbuilding massively or disappointing everyone once the power contract runs into reality.

Infrastructure constraints feed back into model design

Once infrastructure becomes a constraint instead of a background assumption, it starts to shape model and system design. Quietly at first, then explicitly. A few pressure points.

Model size versus utilization

Training a 10x larger model is not just "10x more FLOPs." It is:

10x more accelerator-hours.
More spilled heat per unit time.
More calendar time where a chunk of your physical plant is committed to one job.

That forces harder questions:

Is the marginal capability gain worth tying up this much of our power and cooling budget?
Would we be better off training a smaller, more specialized model and deploying more of them?

The fact that we're even seeing serious work on distillation, mixtures of experts, quantization, and retrieval-augmented architectures is partly algorithmic curiosity and partly energy pragmatism.

Inference efficiency

When AI features become core to products, you multiply per-token cost by billions or trillions of tokens per day. Small improvements in:

Tokens per joule.
Model throughput on a given accelerator.
Cache hit rates and reuse of computations.

translate directly to:

Less megawatt demand.
Less cooling overhead.
Less pressure to build yet another hall.

That changes priorities. A clever new architectural trick that saves 15% energy at scale is more valuable than a small bump on some benchmark that costs 30% more energy.

Scheduling and orchestration

Physical constraints also push you toward smarter orchestration:

Allocating particular jobs to particular halls or regions based on power and cooling headroom.
Delaying non-urgent workloads into time windows where energy is cheaper and capacity is underused.
Partnering with grid operators for demand response, turning AI clusters up or down based on system conditions.

That is a very different mentality from "the GPU fleet is an infinite, flat pool." You're back in the mindset of industrial plants and grid-interactive loads, not just cloud instances.

National and regional limits

Zoom out from one operator to a whole country. AI clusters are now visible at the scale of national energy planning. They compete with:

Electrification of transport.
Electrification of heating.
Decarbonization-driven shifts from fossil fuels to electricity across industries.

Every gigawatt-hour you allocate to AI is one you do not allocate to something else, unless you simultaneously build new generation and transmission. Countries don't like being caught flat-footed by large, opaque loads. You're already seeing:

Moratoria or slowdowns on new data centers in constrained regions.
Requirements to co-site generation and load.
Increased scrutiny of water use and environmental impact.

If you assume "society will always choose more AI over everything else," you're betting that regulators will treat your use of power as more important than:

Keeping the grid stable.
Meeting climate targets.
Keeping domestic industry and housing powered affordably.

That's not a safe assumption everywhere. You're going to see:

Regions that lean into "AI as a strategic industry" and bend infrastructure planning toward it.
Regions that treat data center expansion as a risk and put hard brakes on it.
Multinationals arbitraging between them, to the extent politics allows.

The physics and the politics are now entangled.

Physical limits are not just about maximums

"Physical limit" doesn't only mean "we hit the wall, nothing more is possible." More often it means:

The marginal cost of more scale rises sharply.
The approval friction for new infrastructure spikes.
The number of actors who can play at a certain scale shrinks.

Energy and cooling constraints push us toward a world where:

A few players can afford to keep building frontier-scale clusters.
Many more players rely on:

Smaller, efficient open models.
Regional or on-prem clusters tailored to their workloads.
On-device and edge inference to take pressure off central plants.

That isn't a bad outcome. It just means the dream of infinite, frictionless scaling from "a few big labs and a credit card" collides with geology, grid topology, and human institutions.

On the hardware side, you'll see:

Aggressive work on better accelerators per watt.
Movement toward packaging and cooling that make higher densities manageable.
Pushback from the physical environment when attempts to cheat those constraints show up as outages, hot spots, or angry neighbors.

The cultural lag inside AI organizations

The last piece is psychological. Most AI orgs still think in:

Model generations.
Benchmarks.
Paper deadlines.

Very few think natively in:

Substation build schedules.
Transformer procurement cycles.
Cooling plant upgrades.
Grid interconnection queues.

Those responsibilities are often siloed in "facilities" or "infra," with limited strategic voice compared to research and product. That asymmetry is not sustainable. If your training roadmap assumes doubling compute every 12–18 months, and your infra team quietly knows your next real step-function increase in capacity is three years out pending grid upgrades, you are telling yourself stories. Real strategic planning now has to integrate:

Compute roadmaps.
Infrastructure expansion plans.
Energy sourcing strategy.
Regulatory and community constraints.

The teams that can get researchers, infra people, and energy planners in the same room—and let the physical and financial constraints push back on model fantasies—will build sustainable advantage. The ones that treat infrastructure as a detail to be "handled by the cloud" will keep hitting invisible walls and patching over them expensively.

The point

The first wave of the AI boom was shaped by a simple observation: scale works. Bigger models, more data, more compute, better results.

The next wave is going to be shaped by a less convenient observation: the world that those models live in is finite.

Power is not infinite.
Cooling is not free.
Communities, regulators, and grids have their own priorities.

Infra is not a backdrop. It is the new front line. Treat it as a constraint to design against, and you still have plenty of room to innovate: in algorithms, architectures, scheduling, and product design.

Ignore it, and your scaling law is going to be less about loss curves and more about how many megawatts your last permit actually said you were allowed to draw.

AI Telegraph