IBM gets Gaudi 3 Enterprise AI to the Cloud

IBM gets Gaudi 3 Enterprise AI to the Cloud


IBM is the Secret Sauce for Gaudi 3

In a surprise development, Intel announced that IBM plans to deploy Intel Gaudi 3 accelerators as a service in the IBM cloud to help enterprises scale AI. IBM is a huge business-oriented company; such a partnership goes a long way.

Intel Gaudi 3 AI-powered IBM cloud targeting enterprise AI customers is expected to be available in early 2025. The aim is to help offer more cost-effective scaling of enterprise AI and drive innovation underpinned with security and resiliency. This collaboration will support Gaudi 3 within IBM’s watsonx AI and data platform. IBM Cloud is the first cloud service provider (CSP) to adopt Gaudi 3, and the offering will be available for both hybrid and on-premises environments.

A large number of customers insist on on-premises environments as they want to be in control of their data.

“Unlocking the full potential of AI requires an open and collaborative ecosystem that provides customers with choice and accessible solutions,” said Justin Hotard, executive vice president and general manager, Intel Data Center and AI. “By integrating Gaudi 3 AI Accelerators and Xeon CPUs with IBM Cloud, we are creating new AI capabilities and meeting the demand for affordable, secure, and innovative AI computing solutions.”

IBM is a Massive Partner

The deal is important for both IBM and Intel as it gives both companies a better footprint in the AI market. IBM doesn’t necessarily want to target a ChatGPT replacement tool, offering a general chatbot. IBM usually tailors the experience towards enterprise clients.

Generative AI has the potential to accelerate transformation, and enterprise customers are looking to increase the productivity of their workers by enabling good fast response, performance, cost, and energy efficiency toward their enterprise work.

Through this collaboration, Intel and IBM aim to lower the total cost of ownership to leverage and scale AI, while enhancing performance. Gaudi 3, integrated with 5th Gen Xeon, supports enterprise AI workloads in the cloud and data centers, providing customers with visibility and control over their software stack, simplifying workload and application management.

IBM Cloud and Gaudi 3 aim to help customers more cost-effectively scale enterprise AI workloads, while prioritizing performance, security, and resiliency.

For generative AI inferencing workloads, IBM plans to enable support for Gaudi 3 within IBM’s watsonx AI and data platform, providing watsonx clients with additional AI infrastructure resources for scaling their AI workloads across hybrid cloud environments, helping to optimize model inferencing price/performance.

“IBM is committed to helping our clients drive AI and hybrid cloud innovation by offering solutions to meet their business needs. Our dedication to security and resiliency with IBM Cloud has helped fuel IBM’s hybrid cloud and AI strategy for our enterprise clients,” said Alan Peacock, GM of IBM Cloud. “Leveraging Intel’s Gaudi 3 accelerators on IBM Cloud will provide our clients access to a flexible enterprise AI solution that aims to optimize cost performance. We are unlocking potential new AI business opportunities, designed for clients to more cost-effectively test, innovate, and deploy AI inferencing solutions.”

Gaudi 3 could cost around $15,000 or about half what Nvidia is charging for its H100 offerings. The timing of the announcement is also well played as it turns out that Nvidia ended up having manufacturing issues with its Blackwell generation. The fact that it needed to make another mask for the AI accelerator easily set the company back by a quarter or two.

IBM and Intel will focus on integrating Gaudi 3 into IBM Cloud Virtual Servers for VPC to help enable x86-based enterprises to run applications faster and more securely than before the integration, enhancing user experiences. It helps with performance and security angles and offers more scalability and flexibility.

Gaudi 3 made for both training and inference brings 8 matrix math engines, 64 tensor 5th generation cores, 128 GB HBM memory with a capacity of 3.7 TB/s bandwidth, and 96 MB of SRAM with 12.8 TB/s bandwidth. As it is a massive flexible chip with networking support on board, it also enables 24x200GbE and supports the latest PCIe 5 standard. The 128 GB of HBM 3 memory hugely benefits the Large Language Models LLM efficiency and cost efficiency.

We should expect to hear more in the coming months, but early 2025 should be the date of deployment to customers.