AMD Instinct MI350P: Ultra-fast AI Accelerator As A PCI Express Card

AMD’s Instinct MI350P for regular PCIe 5.0 sockets is said to be primarily suitable for Agentic AI, i.e. AI agents that can automatically assist their users and take over tasks. With its GPU, the card has a few other functions up its sleeve in addition to extremely high AI computing power and a lot of memory throughput. This also includes the acceleration of current video codecs up to AV1 and the division into up to four virtual GPUs.

Halved sister

The MI350P shares its GPU with accelerators in the Open Accelerator Module (OAM) type Instinct MI350X/355X, but only 128 compute units are active in the MI350P, while 256 CUs calculate in the OAM models. AMD also halves the fast HBM3e stack memory from 288 to 144 GB. AMD does not state this in writing, but the image of the card shows what is obvious: the MI350P only uses one I/O die (IOD) with four compute dies (XCDs), so the GPU package is halved compared to its larger siblings.

The Instinct MI350P is intended to complement the OAM server boards and, for example, help existing rack servers to make AI leaps.

(Image: AMD)

The power consumption also drops significantly and, with a nominal TDP of 600 watts, is equal to that of an Nvidia RTX Pro 6000 Blackwell or H200 NVL, with which it is obviously intended to compete. To supply energy, AMD uses the controversial ATX connector 12V-2×6. Alternatively, the card can be set to a 450 watt mode.

To serve multiple users at the same time, there are three partitioning options: SPX, DPX and CPX. The former corresponds to full operation, with DPX two users share the resources (CUs, RAM, video and JPEG engine, L2 cache and DMA engines) equally and with CPX there are four users. In CPX mode, two partitions compete for one video and one block of ten JPEG engines each. But they should still have enough reserves, because the entire chip can manage 99 AV1 streams (1080p30, 4:2:0) and 4425 JPEG images per second in 1080p.

High computing power

Read more after the ad

AMD did not share specific performance estimates in advance, but the theoretical computing power – multiplied by the number of execution units and the clock frequency – is 2300 teraflops with FP8 precision (densely populated matrices, with sparsity the respective value roughly doubles). MXFP4 doubles this rate to 4600 Tflops, MXFP6, unlike Nvidia, does the same. This means that the theoretical computing power is a little less than half that of an MI355X. On paper, Nvidia’s H200 NVL manages around 1670 Tflops with fully populated matrices (with sparsity then 3340 Tflops).

AMD also provides an estimate of the actual throughput achieved, which also takes into account memory transfers and limitations due to power consumption. Accordingly, the Instinct MI350P is between 60 and 70 percent of its maximum throughput rates. The outlier is MXFP6 with 40 percent of the theoretical throughput, so the value only increases by a good third instead of doubling compared to the (MX)FP8.

The theoretical and practical computing power of the Instinct MI350P differs significantly from one another. The reasons include the available electrical power and the necessary storage and bus transfers.

(Image: AMD)

(csp)

AMD Instinct MI350P: ultra-fast AI accelerator as a PCI Express card

Halved sister

High computing power

Leave a Reply Cancel reply

Stay Connected

Latest News

A flaw in the heart of Windows 11 allows you to take control of your PC

Whatsapp, Instagram, Facebook: Meta launches Plus subscriptions for additional functions

Meta plans paid subscriptions for AI

Best Perimeter Security Systems For Small Businesses In 2026

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Halved sister

High computing power

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News