The market is changing daily. Stay on top of changes with our June Market Update »

192GB of Memory Changes What's Possible

The AMD Instinct MI300X packs 192GB of HBM3 memory and 5.3 TB/s of bandwidth into a single accelerator, enough to run a 70-billion-parameter LLM without splitting the model across multiple GPUs. Eight MI300X GPUs in a Dell PowerEdge XE9680 deliver 1.5TB of combined memory and over 21 petaflops of FP16 performance.

Built on the CDNA 3 architecture, it uses AMD's open-source ROCm software stack with native PyTorch, TensorFlow, and JAX support. For organizations that want industry-leading memory capacity and an open toolchain, the MI300X is worth evaluating.

MI300X vs. H100 SXM: The Real Comparison

Your choice depends on whether your workloads are memory-bound (MI300X wins) or compute-bound with sparsity (H100 wins).

MI300X H100 SXM
Architecture AMD CDNA 3 NVIDIA Hopper
Memory 192GB HBM3 80GB HBM3
Bandwidth 5.3 TB/s 3.35 TB/s
FP16 1,307 TFLOPS 1,979 TFLOPS*
FP8 2,615 TFLOPS 3,958 TFLOPS*
TDP 750W 700W
Software ROCm (open) CUDA
* NVIDIA specs shown with sparsity. Compare All GPUs in Our Matrix
192GB
HBM3 Per GPU
5.3
TB/s Memory Bandwidth
1.5TB
Combined HBM3 (8-GPU)
304
Compute Units

Dell PowerEdge Server for the MI300X

The MI300X uses an OAM (Open Accelerator Module) form factor, deployed as an 8-GPU platform on an AMD Universal Base Board.

Dell PowerEdge XE9680

Dell PowerEdge XE9680

  • 8× AMD Instinct MI300X accelerators via AMD UBB 2.0
  • 1.5TB combined HBM3 memory across all 8 GPUs
  • AMD Infinity Fabric interconnect (128 GB/s per link, 7 links per GPU)
  • Over 21 petaflops FP16, 42 petaflops FP8 (with sparsity)
  • PCIe Gen5 host interface, AMD ROCm software stack
  • Dell OpenManage Enterprise, APEX AIOps, integrated cyber recovery
View Server
The XE9680 also supports NVIDIA H100 and H200 GPUs. ServerMonkey can configure and quote either platform in the same chassis.

Where the MI300X Excels

Large Language Model Training & Inference

Large Language Model Training & Inference

The MI300X's 192GB memory per GPU means you can load a 70B-parameter model on a single accelerator, or run multiple concurrent instances across all eight GPUs without splitting models across nodes. For LLM inference at scale, this memory advantage translates to higher throughput and lower latency because fewer GPUs are needed per model instance. Dell has demonstrated Llama 2 70B running on a single MI300X, and fine-tuning the same model across eight MI300X GPUs on a single XE9680 node.

High-Performance Computing

High-Performance Computing

The MI300X delivers 163.4 TFLOPS of FP64 (double precision), making it one of the fastest GPUs available for scientific computing. Climate modeling, molecular dynamics, computational fluid dynamics, and genomics workloads benefit from both the raw compute throughput and the massive memory capacity that lets larger problem sets fit entirely in GPU memory without frequent host-device data transfers.

Open Ecosystem & ROCm

Open Ecosystem & ROCm

AMD's ROCm software stack is open-source, with upstream support in PyTorch, TensorFlow, JAX, and ONNX Runtime. The HIP conversion tool helps port CUDA applications to run on MI300X hardware. For organizations that prefer open-source toolchains, or that want to avoid single-vendor lock-in, the MI300X with ROCm provides a fully supported alternative path. Dell provides validated designs and deployment guides specifically for the XE9680 + MI300X + ROCm stack.

AMD Instinct MI300X Accelerator

Specification MI300X
GPU Architecture AMD CDNA 3
Process 5nm / 6nm FinFET (chiplet)
Compute Units 304
GPU Memory 192GB HBM3
Memory Bandwidth 5.3 TB/s
Infinity Cache 256 MB
FP64 81.7 TFLOPS
FP64 Matrix 163.4 TFLOPS
FP32 163.4 TFLOPS
TF32 653.7 / 1,307.4 TFLOPS*
FP16 1,307.4 / 2,614.9 TFLOPS*
BF16 1,307.4 / 2,614.9 TFLOPS*
FP8 2,614.9 / 5,229.8 TFLOPS*
INT8 2,614.9 / 5,229.8 TOPS*
Interconnect AMD Infinity Fabric (7 links, 128 GB/s each)
Host Interface PCIe Gen5 x16
TDP 750W
Form Factor OAM (Open Accelerator Module)
Software AMD ROCm (open-source)
* Dense / Sparse (with structural sparsity).
 

Exploring AMD for Your AI Infrastructure?

ServerMonkey can configure the Dell PowerEdge XE9680 with AMD Instinct MI300X or NVIDIA GPUs. Let us help you evaluate both.

Request a Quote

What are you looking for?