The market is changing daily. Stay on top of changes with our June Market Update »

AI Acceleration Without Vendor Lock-In

The Intel Gaudi 3 PCIe card (HL-338) packs 128GB of HBM2e memory, 3.7 TB/s bandwidth, and 24 on-chip 200GbE RoCE v2 networking ports into a standard PCIe Gen5 dual-slot card. That last point is what makes it different from every other accelerator in this lineup: the networking is built into the chip, not bolted on as separate NICs.

This means you can scale Gaudi 3 clusters using the Ethernet switches you already own instead of investing in proprietary interconnects. It supports PyTorch natively, integrates with Hugging Face and vLLM, and handles LLM inference, fine-tuning, and training workloads with automated FP8 quantization.

Gaudi 3 PCIe vs. H100 NVL PCIe

The buyer's real question: open Ethernet scaling with more memory, or the established CUDA ecosystem?

Gaudi 3 PCIe H100 NVL PCIe
Memory128GB HBM2e94GB HBM3
Bandwidth3.7 TB/s3.9 TB/s
BF161,835 TFLOPS1,671 TFLOPS*
FP81,835 TFLOPS3,341 TFLOPS*
On-Chip NICs24× 200GbENone
TDP600W350-400W
SoftwarePyTorch (native)CUDA
* NVIDIA specs with sparsity. Compare All Accelerators
128GB
HBM2e Memory
24
On-Chip 200GbE Ports
1.8
PFLOPS FP8 / BF16
600W
PCIe Gen5 Dual-Slot

Dell PowerEdge Server for Gaudi 3 PCIe

Dell is the lead OEM and first to market with an integrated Gaudi 3 PCIe server configuration.

Dell PowerEdge XE7740

Dell PowerEdge XE7740

  • 4U server, up to 8× Intel Gaudi 3 PCIe accelerators
  • Optional 2× groups of 4-way bridged accelerators (RoCE v2)
  • 1:1 accelerator-to-NIC ratio via 8 full-height PCIe slots + OCP module
  • Air-cooled, fits ~10kW racks without cooling upgrades
  • Optimized for Llama, DeepSeek, Phi, Qwen, Falcon, and more
  • Dell Smart Cooling, OpenManage Enterprise, APEX AIOps
View Server

Where Gaudi 3 PCIe Fits

LLM Inference & Fine-Tuning

LLM Inference & Fine-Tuning

128GB of HBM2e means larger models fit in memory without model parallelism overhead. The 24 integrated 200GbE ports eliminate the need for separate NICs, reducing cost and latency in multi-node inference clusters. Native vLLM and Hugging Face support with automated FP8 quantization makes deployment straightforward for popular models including Llama, DeepSeek, and Falcon.

Ethernet-Native Scaling

Ethernet-Native Scaling

Every other GPU accelerator requires separate NICs for inter-node communication. Gaudi 3 integrates 24× 200GbE RoCE v2 ports directly on the chip, delivering 4.8 Tb/s of networking bandwidth per card. This means you can build multi-node training and inference clusters using the standard Ethernet switches you already own, without investing in proprietary interconnect hardware like NVLink or InfiniBand.

Open Software & No Lock-In

Open Software & No Lock-In

Gaudi 3 integrates natively with PyTorch, so your team works with the framework they already know. Hugging Face model hub support and automated FP8 quantization simplify deployment. Unlike proprietary ecosystems, Intel's software stack is open, and the hardware scales over standard Ethernet. For organizations building AI infrastructure that they want to own and control, Gaudi 3 removes the lock-in concern.

Intel Gaudi 3 PCIe (HL-338)

Specification Gaudi 3 PCIe
ArchitectureIntel Gaudi 3 (5nm)
Compute Engines8 MME + 64 TPC
Memory128GB HBM2e
Memory Bandwidth3.7 TB/s
On-Die SRAM96MB (12.8 TB/s)
FP81,835 TFLOPS
BF161,835 TFLOPS
Data TypesFP8, BF16, FP16, TF32, FP32
Networking24× 200GbE RoCE v2 on-chip (4.8 Tb/s)
Host InterfacePCIe Gen5 x16
TDPUp to 600W
Form FactorDual-slot, FHFL PCIe
ThermalPassive
SoftwareIntel Gaudi Software, PyTorch, vLLM, Hugging Face
 

Ready to Evaluate Intel Gaudi 3?

Open ecosystem, standard Ethernet, no vendor lock-in. ServerMonkey can configure the Dell PowerEdge XE7740 with Gaudi 3 PCIe.

Request a Quote

What are you looking for?