Navigating the Future: Balancing Challenges and Opportunities in AI-Driven Data Center Demand

Byline: Ilissa Miller

DatacenterDynamics held its Connect>New York event in New York City last week. It was a very well attended event with a number of different tracks and opportunities to gain insights, learn, and challenge the status quo in the data center industry.

Sponsored

One of the questions pondered during a roundtable discussion at the event was:  What is the biggest challenge and opportunity that could arise in the future due to data center demand driven by AI?

The panelists had great insights, and so too did AI!

According to Perplexity AI, which is similar to ChatGPT and has advanced algorithms that can handle large amounts of data, understand context, give coherent answers, and use the latest machine learning techniques, the response was as follows:

“The biggest challenge that could arise in the future due to data center demand driven by AI is the shortage of wholesale data center space. The AI boom is creating a resource-constrained world, leading to stupendous demand for GPUs, data centers, and AI expertise. Data center developers are effectively running out of wholesale data center space in many major markets, with hyperscale users occupying most of the capacity under development. This scarcity of data center space poses a significant challenge for enterprises looking to run AI workloads

On the other hand, the opportunity that could arise from this increased demand is the growth and innovation in digital infrastructure to meet these challenges. The digital infrastructure sector has matured, has strong capital support, and a history of innovation to address the challenges posed by the surge in demand for data centers driven by AI. Companies are focusing on expanding the use of alternative energy, smart grids, hybrid grids, and innovative data center designs to deliver reliable solutions while lessening negative impacts on the environment. This presents an opportunity for advancements in technology and sustainability at the intersection of high-density cooling and power infrastructure to support the evolving needs of AI-driven data centers.”

Not to be outdone by it’s popular counterpart, ChatGPT, which offered the following insights:

“One of the biggest challenges posed by the increasing demand for data centers driven by AI is the strain on energy resources and environmental sustainability. AI requires massive amounts of computational power, which in turn demands substantial energy consumption. Balancing this energy usage with the need for sustainability will be crucial in the future.

On the flip side, this demand also presents a significant opportunity for innovation in energy-efficient computing technologies. As the need for data centers grows, so does the incentive to develop more efficient hardware and cooling systems. This could spur advancements in renewable energy integration, novel cooling techniques, and optimized architectures, leading to more sustainable and environmentally friendly data center operations. Additionally, the growing demand for AI-driven services could drive economic growth and job creation in the tech sector.”

Sponsored

Deploying Inference versus Learning AI Nodes

As humans discussed the question, a few key points came up.

Inference versus Learning AI approaches have different requirements. AI inference is when an AI model produces predictions or conclusions. AI training is the process that enables AI models to make accurate inferences.  Both approaches have different requirements for ongoing operations.  According to the conversation, Inference AI requires an edge deployment, and as more and more companies adopt AI tools and Inference, which is the process of applying knowledge gained during the learning phase to make predictions, classifications, or decisions based on new data, urban ‘edge’ data centers will be on the rise.

Inference AI requires real-time processing – in other words – minimal latency. It uses a lot of bandwidth as it transmits large volumes of raw data from IoT devices, sensors or other edge devices to a central data center. This can strain network bandwidth and incur significant variable costs.  And finally, privacy and security for certain applications, especially sensitive data or compliance requirements, may necessitate data processing to occur locally for privacy and security.

On the other hand, Learning AI can be nearly anywhere. Companies can operate large scale learning AI deployments from nearly anywhere. Learning AI can happen offline or distributed across various computing resources including cloud servers, on-premise data centers and even personal devices.  Learning AI does require data availability, meaning it must be able to access large amounts of data relevant to the query or task at hand.  It also requires access to a lot of computer resources – power and memory primarily. The physical location doesn’t matter as long as the compute power and memory can support the learning queries.

The question posted about the opportunity and demand driven by AI is that they are vast and unpredictable at best.  Key takeaways include:

  • The development of new business models to differentiate between large-scale deployments and general enterprise requirements.  For example, Equinix formed a joint venture called xScale to cater to hyperscale customers that buy large amounts of space and power but don’t deliver the margins for investors. So the joint venture was set up to provide a consistent and more reliable return for investors.
  • Retrofitting data centers is expensive, and as a result, new builds purposely designed for the massive compute requirements AI needs are the more cost-effective ways to roll-out large-scale AI models
  • Data center providers will need to figure out a solution for the white space they will have in their facilities. Many already have a balance averaging 60% infrastructure and 40% white space. For AI specific builds the average balance would be more like 75% infrastructure and 25% white space.
  • Rack-level retrofits are also expensive, but are doable. There are flexible ways to increase density and improve performance rack-by-rack.
  • Hybrid designs that have a mix of liquid cooling and traditional air cooling and/or direct to chip cooling options should be considered, and customers will have to decide. Large data center operators are starting to see customers trial various solutions side-by-side with liquid cooling on the immediate horizon.

In a nutshell, the opportunity for AI deployments lies in inference AI versus learning AI.  Edge data center deployments and other micro market providers have a number of emerging opportunities as more enterprise businesses adopt, deploy and engage in AI.

The challenge? Costs. Liquid cooling combined with direct to chip cooling, air cool systems and more are required for effectively keeping the compute infrastructure operating efficiently. This requires a tremendous amount of upfront capital to design and deploy, and the operational variables are hard to model to assess the overall lifetime value versus cost.

We’re looking forward to continuing the conversation at the upcoming data center conferences and the DatacenterDynamics inaugural event, Yotta, which is set for October 7-9, 2024 at the MGM Grand in Las Vegas.  We look forward to seeing you there!