AMD Instinct MI300X: 192GB HBM3 AI GPU

Home
Accelerators
AMD
AMD MI300x

AMD CDNA 3 Architecture

Overview

192GB of Memory Changes What's Possible

The AMD Instinct MI300X packs 192GB of HBM3 memory and 5.3 TB/s of bandwidth into a single accelerator, enough to run a 70-billion-parameter LLM without splitting the model across multiple GPUs. Eight MI300X GPUs in a Dell PowerEdge XE9680 deliver 1.5TB of combined memory and over 21 petaflops of FP16 performance.

Built on the CDNA 3 architecture, it uses AMD's open-source ROCm software stack with native PyTorch, TensorFlow, and JAX support. For organizations that want industry-leading memory capacity and an open toolchain, the MI300X is worth evaluating.

MI300X vs. H100 SXM: The Real Comparison

Your choice depends on whether your workloads are memory-bound (MI300X wins) or compute-bound with sparsity (H100 wins).

	MI300X	H100 SXM
Architecture	AMD CDNA 3	NVIDIA Hopper
Memory	192GB HBM3	80GB HBM3
Bandwidth	5.3 TB/s	3.35 TB/s
FP16	1,307 TFLOPS	1,979 TFLOPS*
FP8	2,615 TFLOPS	3,958 TFLOPS*
TDP	750W	700W
Software	ROCm (open)	CUDA

* NVIDIA specs shown with sparsity. Compare All GPUs in Our Matrix

192GB

HBM3 Per GPU

5.3

TB/s Memory Bandwidth

1.5TB

Combined HBM3 (8-GPU)

304

Compute Units

Compatible Servers

Dell PowerEdge Server for the MI300X

The MI300X uses an OAM (Open Accelerator Module) form factor, deployed as an 8-GPU platform on an AMD Universal Base Board.

Dell PowerEdge XE9680

8× AMD Instinct MI300X accelerators via AMD UBB 2.0
1.5TB combined HBM3 memory across all 8 GPUs
AMD Infinity Fabric interconnect (128 GB/s per link, 7 links per GPU)
Over 21 petaflops FP16, 42 petaflops FP8 (with sparsity)
PCIe Gen5 host interface, AMD ROCm software stack
Dell OpenManage Enterprise, APEX AIOps, integrated cyber recovery

View Server

The XE9680 also supports NVIDIA H100 and H200 GPUs. ServerMonkey can configure and quote either platform in the same chassis.

View Full GPU Compatibility Matrix

Use Cases

Where the MI300X Excels

Large Language Model Training & Inference

The MI300X's 192GB memory per GPU means you can load a 70B-parameter model on a single accelerator, or run multiple concurrent instances across all eight GPUs without splitting models across nodes. For LLM inference at scale, this memory advantage translates to higher throughput and lower latency because fewer GPUs are needed per model instance. Dell has demonstrated Llama 2 70B running on a single MI300X, and fine-tuning the same model across eight MI300X GPUs on a single XE9680 node.

High-Performance Computing

The MI300X delivers 163.4 TFLOPS of FP64 (double precision), making it one of the fastest GPUs available for scientific computing. Climate modeling, molecular dynamics, computational fluid dynamics, and genomics workloads benefit from both the raw compute throughput and the massive memory capacity that lets larger problem sets fit entirely in GPU memory without frequent host-device data transfers.

Open Ecosystem & ROCm

AMD's ROCm software stack is open-source, with upstream support in PyTorch, TensorFlow, JAX, and ONNX Runtime. The HIP conversion tool helps port CUDA applications to run on MI300X hardware. For organizations that prefer open-source toolchains, or that want to avoid single-vendor lock-in, the MI300X with ROCm provides a fully supported alternative path. Dell provides validated designs and deployment guides specifically for the XE9680 + MI300X + ROCm stack.

Specifications

AMD Instinct MI300X Accelerator

Specification	MI300X
GPU Architecture	AMD CDNA 3
Process	5nm / 6nm FinFET (chiplet)
Compute Units	304
GPU Memory	192GB HBM3
Memory Bandwidth	5.3 TB/s
Infinity Cache	256 MB
FP64	81.7 TFLOPS
FP64 Matrix	163.4 TFLOPS
FP32	163.4 TFLOPS
TF32	653.7 / 1,307.4 TFLOPS*
FP16	1,307.4 / 2,614.9 TFLOPS*
BF16	1,307.4 / 2,614.9 TFLOPS*
FP8	2,614.9 / 5,229.8 TFLOPS*
INT8	2,614.9 / 5,229.8 TOPS*
Interconnect	AMD Infinity Fabric (7 links, 128 GB/s each)
Host Interface	PCIe Gen5 x16
TDP	750W
Form Factor	OAM (Open Accelerator Module)
Software	AMD ROCm (open-source)

* Dense / Sparse (with structural sparsity).

View Datasheet

Exploring AMD for Your AI Infrastructure?

ServerMonkey can configure the Dell PowerEdge XE9680 with AMD Instinct MI300X or NVIDIA GPUs. Let us help you evaluate both.

Request a Quote

AMD Instinct MI300X

192GB of Memory Changes What's Possible

MI300X vs. H100 SXM: The Real Comparison

Dell PowerEdge Server for the MI300X

Dell PowerEdge XE9680

Where the MI300X Excels

Large Language Model Training & Inference

High-Performance Computing

Open Ecosystem & ROCm

AMD Instinct MI300X Accelerator

Exploring AMD for Your AI Infrastructure?

What are you looking for?

Get to Know Us

Resources

What We Do

Customer Service

Follow Us