The AMD Ryzen 9 9950X and Ryzen 9 9900X Review: Flagship Zen 5 Soars - and Stalls
by Gavin Bonshor on August 14, 2024 9:00 AM EST- Posted in
- CPUs
- AMD
- Desktop
- Zen 5
- AM5
- Ryzen 9000
- Ryzen 9 9950X
- Ryzen 9 9900X
Core-to-Core Latency: Zen 5 Gets Weird
As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.
But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.
If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.
Looking at the above latency matrix of the Ryzen 9 9950X, we observe that the lowest latencies naturally occur between adjacent cores on the same CCX. The core pairs such as 0-1, 1-2, and 2-3 consistently show latencies in the 18.6 to 20.5 nanoseconds range. This is indicative of the fast L3 cache shared within the CCX, which ensures rapid communication between the inner cores on the same complex.
Compared to the Ryzen 9 7950X, we are seeing a slight increase in latencies within a single CCX. The SMT "advantage", where two logical cores sharing a single physical core have a lower latency, appears to be gone. Instead, latencies are consistently around 20ns from any logical core to any other logical core within a single CCX. That average is slightly up from 18ns on the 7950X, though it's not clear what the chief contributing factor is.
More significantly – and worryingly so – are the inter-CCD latencies. That is, the latency to go from a core on one CCD to a core on the other CCD. AMD's multi-CCD Ryzen designs have always taken a penalty here, as communicating between different CCDs means taking a long trek through AMD's Infinity Fabric to the IOD and back out to the other CCD. But the inter-CCD latencies are much higher here than we were expecting.
For reference, on the Ryzen 9 7950X, going to another CCD is around 76ns. But in Ryzen 9 9950X, we're seeing an average latency of 180ns, over twice the cost of the previous generation of Ryzen. Making this all the more confusing, Granite Ridge (desktop Ryzen 9000) reuses the same IOD and Infinity Fabric configuration as Raphael (Ryzen 7000) – all AMD has done is swap out the Zen 4 CCDs for Zen 5 CCDs. So by all expectations, we should not be seeing significantly higher inter-CCD latency here.
Our current working theory is that this is a side-effect of AMD's core parking changes for Ryzen 9000. That cores are being aggressively put to sleep, and that as a result, it's taking an extra 100ns to wake them up. If that is correct, then our core-to-core latency test is just about the worst case scenario for that strategy, as it's sending data between cores in short bursts, rather than running a sustained workload that keeps the cores alive over the long-haul.
At this point, we're running some additional tests on the 9950X without AMD's PPM provisioning driver installed, to see if that's having an impact. Otherwise, these high latencies, if accurate for all workloads, would represent a significant problem for multi-threaded workloads that straddle the Infinity Fabric.
123 Comments
View All Comments
Oxford Guy - Friday, August 16, 2024 - link
Intel sells CPUs with small cores and large cores. That's a kludge, too. Get used to these kludges because they're the new reality.Makaveli - Wednesday, August 14, 2024 - link
If you are on Zen 4 you can skip this your next upgrade is Zen 6 and a new motherboard.For those of us on Zen 3 or lower Zen 5 is a good move.
ondma - Thursday, August 15, 2024 - link
Even if upgrading from Zen 3 or older, right now it is hard to recommend Zen 5 over Zen 4, at least until the price comes down on Zen 5. Zen 5 offers negligible performance gains at a higher price. I suppose you could argue Zen 5 is more "future proof" if AVX 512 suddenly becomes mainstream, but it has been around a long time and is still a niche instruction set.GeoffreyA - Thursday, August 15, 2024 - link
AVX-512 is used in different video encoders and decoders: x265, SVT-AV1, and dav1d at the least, possibly x264 and FFmpeg, and I am sure elsewhere too. So it is available in quite common software used today.Oxford Guy - Friday, August 16, 2024 - link
Didn't this site have at least one custom benchmark designed specifically to showcase AVX-512 performance in CPU reviews?I don't know if there were more than one. I do recall, though, that the AVX-512 performance of Intel CPUs at the time was considered important enough to showcase, and not merely in an article specific to AVX-512 performance.
GeoffreyA - Saturday, August 17, 2024 - link
I think the 3DPMv2 benchmark. All it did was show how much faster Intel was than the competition. It took a lot of criticism and was called unrealistic.TheinsanegamerN - Monday, August 19, 2024 - link
And how many people are encoding video even on a monthly basis?For decode, well, I can decode 4k video just fine on my zen 3 rig, so its not necessary. AVX512 is a nice to have, not a necessity.
GeoffreyA - Monday, August 19, 2024 - link
Certainly. AVX2 tends to be the baseline nowadays. I'm just pointing out that AVX-512 *is* used in common software.GeoffreyA - Monday, August 19, 2024 - link
The weakness in AMD's strategy is that this is becoming like 3DNow! All Intel has to do is cut out AVX-512, and it will, possibly, die off. Programmers won't want to write code not used on both sides.Oxford Guy - Thursday, August 22, 2024 - link
Intel has been like Lucy with the football with AVX-512 and I think the buck is stopping.AMD has become a force to reckon with, especially given how weak Intel is, in terms of competing. I don't think Intel has the luxury to play Lucy with AVX-512 now.