- Home
- Accelerators
- AMD
- AMD MI300x
Overview
192GB of Memory Changes What's Possible
The AMD Instinct MI300X packs 192GB of HBM3 memory and 5.3 TB/s of bandwidth into a single accelerator, enough to run a 70-billion-parameter LLM without splitting the model across multiple GPUs. Eight MI300X GPUs in a Dell PowerEdge XE9680 deliver 1.5TB of combined memory and over 21 petaflops of FP16 performance.
Built on the CDNA 3 architecture, it uses AMD's open-source ROCm software stack with native PyTorch, TensorFlow, and JAX support. For organizations that want industry-leading memory capacity and an open toolchain, the MI300X is worth evaluating.
MI300X vs. H100 SXM: The Real Comparison
Your choice depends on whether your workloads are memory-bound (MI300X wins) or compute-bound with sparsity (H100 wins).
| MI300X | H100 SXM | |
|---|---|---|
| Architecture | AMD CDNA 3 | NVIDIA Hopper |
| Memory | 192GB HBM3 | 80GB HBM3 |
| Bandwidth | 5.3 TB/s | 3.35 TB/s |
| FP16 | 1,307 TFLOPS | 1,979 TFLOPS* |
| FP8 | 2,615 TFLOPS | 3,958 TFLOPS* |
| TDP | 750W | 700W |
| Software | ROCm (open) | CUDA |
192GB
HBM3 Per GPU
5.3
TB/s Memory Bandwidth
1.5TB
Combined HBM3 (8-GPU)
304
Compute Units
Compatible Servers
Dell PowerEdge Server for the MI300X
The MI300X uses an OAM (Open Accelerator Module) form factor, deployed as an 8-GPU platform on an AMD Universal Base Board.
Dell PowerEdge XE9680
- 8× AMD Instinct MI300X accelerators via AMD UBB 2.0
- 1.5TB combined HBM3 memory across all 8 GPUs
- AMD Infinity Fabric interconnect (128 GB/s per link, 7 links per GPU)
- Over 21 petaflops FP16, 42 petaflops FP8 (with sparsity)
- PCIe Gen5 host interface, AMD ROCm software stack
- Dell OpenManage Enterprise, APEX AIOps, integrated cyber recovery
The XE9680 also supports NVIDIA H100 and H200 GPUs. ServerMonkey can configure and quote either platform in the same chassis.
Use Cases
Where the MI300X Excels
Large Language Model Training & Inference
The MI300X's 192GB memory per GPU means you can load a 70B-parameter model on a single accelerator, or run multiple concurrent instances across all eight GPUs without splitting models across nodes. For LLM inference at scale, this memory advantage translates to higher throughput and lower latency because fewer GPUs are needed per model instance. Dell has demonstrated Llama 2 70B running on a single MI300X, and fine-tuning the same model across eight MI300X GPUs on a single XE9680 node.
High-Performance Computing
The MI300X delivers 163.4 TFLOPS of FP64 (double precision), making it one of the fastest GPUs available for scientific computing. Climate modeling, molecular dynamics, computational fluid dynamics, and genomics workloads benefit from both the raw compute throughput and the massive memory capacity that lets larger problem sets fit entirely in GPU memory without frequent host-device data transfers.
Open Ecosystem & ROCm
AMD's ROCm software stack is open-source, with upstream support in PyTorch, TensorFlow, JAX, and ONNX Runtime. The HIP conversion tool helps port CUDA applications to run on MI300X hardware. For organizations that prefer open-source toolchains, or that want to avoid single-vendor lock-in, the MI300X with ROCm provides a fully supported alternative path. Dell provides validated designs and deployment guides specifically for the XE9680 + MI300X + ROCm stack.
Specifications
AMD Instinct MI300X Accelerator
| Specification | MI300X |
|---|---|
| GPU Architecture | AMD CDNA 3 |
| Process | 5nm / 6nm FinFET (chiplet) |
| Compute Units | 304 |
| GPU Memory | 192GB HBM3 |
| Memory Bandwidth | 5.3 TB/s |
| Infinity Cache | 256 MB |
| FP64 | 81.7 TFLOPS |
| FP64 Matrix | 163.4 TFLOPS |
| FP32 | 163.4 TFLOPS |
| TF32 | 653.7 / 1,307.4 TFLOPS* |
| FP16 | 1,307.4 / 2,614.9 TFLOPS* |
| BF16 | 1,307.4 / 2,614.9 TFLOPS* |
| FP8 | 2,614.9 / 5,229.8 TFLOPS* |
| INT8 | 2,614.9 / 5,229.8 TOPS* |
| Interconnect | AMD Infinity Fabric (7 links, 128 GB/s each) |
| Host Interface | PCIe Gen5 x16 |
| TDP | 750W |
| Form Factor | OAM (Open Accelerator Module) |
| Software | AMD ROCm (open-source) |
Exploring AMD for Your AI Infrastructure?
ServerMonkey can configure the Dell PowerEdge XE9680 with AMD Instinct MI300X or NVIDIA GPUs. Let us help you evaluate both.
Request a Quote




