The AMD Ryzen 9 9950X and Ryzen 9 9900X Review: Flagship Zen 5 Soars - and Stalls
by Gavin Bonshor on August 14, 2024 9:00 AM EST- Posted in
- CPUs
- AMD
- Desktop
- Zen 5
- AM5
- Ryzen 9000
- Ryzen 9 9950X
- Ryzen 9 9900X
Core-to-Core Latency: Zen 5 Gets Weird
As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.
But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.
If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.
Looking at the above latency matrix of the Ryzen 9 9950X, we observe that the lowest latencies naturally occur between adjacent cores on the same CCX. The core pairs such as 0-1, 1-2, and 2-3 consistently show latencies in the 18.6 to 20.5 nanoseconds range. This is indicative of the fast L3 cache shared within the CCX, which ensures rapid communication between the inner cores on the same complex.
Compared to the Ryzen 9 7950X, we are seeing a slight increase in latencies within a single CCX. The SMT "advantage", where two logical cores sharing a single physical core have a lower latency, appears to be gone. Instead, latencies are consistently around 20ns from any logical core to any other logical core within a single CCX. That average is slightly up from 18ns on the 7950X, though it's not clear what the chief contributing factor is.
More significantly – and worryingly so – are the inter-CCD latencies. That is, the latency to go from a core on one CCD to a core on the other CCD. AMD's multi-CCD Ryzen designs have always taken a penalty here, as communicating between different CCDs means taking a long trek through AMD's Infinity Fabric to the IOD and back out to the other CCD. But the inter-CCD latencies are much higher here than we were expecting.
For reference, on the Ryzen 9 7950X, going to another CCD is around 76ns. But in Ryzen 9 9950X, we're seeing an average latency of 180ns, over twice the cost of the previous generation of Ryzen. Making this all the more confusing, Granite Ridge (desktop Ryzen 9000) reuses the same IOD and Infinity Fabric configuration as Raphael (Ryzen 7000) – all AMD has done is swap out the Zen 4 CCDs for Zen 5 CCDs. So by all expectations, we should not be seeing significantly higher inter-CCD latency here.
Our current working theory is that this is a side-effect of AMD's core parking changes for Ryzen 9000. That cores are being aggressively put to sleep, and that as a result, it's taking an extra 100ns to wake them up. If that is correct, then our core-to-core latency test is just about the worst case scenario for that strategy, as it's sending data between cores in short bursts, rather than running a sustained workload that keeps the cores alive over the long-haul.
At this point, we're running some additional tests on the 9950X without AMD's PPM provisioning driver installed, to see if that's having an impact. Otherwise, these high latencies, if accurate for all workloads, would represent a significant problem for multi-threaded workloads that straddle the Infinity Fabric.
123 Comments
View All Comments
kwohlt - Wednesday, August 14, 2024 - link
Not magically. After Intel 7, Intel released Intel 4 last year for laptop only. Then after that, they released Intel 3, which is only being used in Xeon (SRF already launched, GNR soon).Then after that is 20A, which will be a token ARL SKU which most of the ARL/LNL volume being on TSMC N3B. N3B is several nodes past Intel 7. 20A (and 18A next year) is several nodes past Intel 7.
"Intel" didn't skip several nodes. They just haven't released any desktop parts on the 2 nodes they've released since Intel 7
Khanan - Wednesday, August 14, 2024 - link
So now I’m finished reading the whole thing. Typical new launch CPU issues, how many times did I read about problems with new CPUs at launch? It happens more often than it doesn’t, these things will be ironed out, it’s not a big problem.Worrying here is only for me the strict Anandtech enforcement of going with extremely slow RAM, which, as you can see nearly everywhere, just chokes those 16 cores. It’s evident that it needs way faster RAM to properly function, here the author or Anandtech missed the chance to test it also (not only) with proper RAM with at least 6400 and better even 7000-8000 speeds. Dual channel isn’t a lot for 16 cores, 16 cores used to have quad channel, if you just have dual channel you should use proper RAM and not the absolute slowest possible, as was used here (the minimum spec, more or less, and now don’t come and tell me you could’ve used even lower 4800s instead). So this is kinda 9950X for me in a worst case scenario, in a lot of the tests, maybe not all of them. Otherwise good review.
ikjadoon - Wednesday, August 14, 2024 - link
Core parking doesn't always "ironed out", does it? Some users still have difficulties with the Zen4 X3D CPUs, nearly 2 years later, sadly. It seems to require too much software intervention. That core parking depending on Windows-level drivers is especially problematic to me: who knows what bugs the next Windows update will bring?I'd much rather have a reliable 100%-working 9700X 8C than a 95%-working 9950X 16C, but that's me and my low priority for nT workloads.
//
Re: DRAM. This is a time-honored and well-defended practice to use the *highest* DRAM AMD has specified. If all Zen5 AMD CPU IMCs can reliably hit 6000 Mbps on the DRAM, then AMD should allow that. Why is AMD holding back? Is AMD willingly destroying its Zen5 performance?! No, my friend.
AMD specifically refused / failed to bin Zen5 CPUs for 6400 due to IMC & fabric issues (see below). This is AMD's choice. With the ongoing Intel 13th/14th gen debacle, I think it's the wrong move to ask reviewers to go above & beyond the CPU manufacturers spec.
//
Hardware Unboxed shared AMD's reviewer's guide for Zen5. Again, AMD has recommended 6000 Mbps as the DRAM sweet spot, noting IMC and Fabric clocks. See here:
https://youtu.be/IeBruhhigPI?t=1456
6400+ or higher can even cause lower performance for some kits and some users. Thus, 6000 Mbps is AMD's official highest-recommended EXPO / OC speed and 5600 Mbps is the guaranteed speed.
We should benchmark CPUs at guaranteed speeds, not "usually it works" speeds.
tommo1982 - Wednesday, August 14, 2024 - link
"... We should benchmark CPUs at guaranteed speeds, not "usually it works" speeds."Agreed. My Ryzen 5 Pro 3350G can make Win10 throw blue screen if I set RAM to 3200MHz. It's random, and each time I need to configure RAM speed again, because BIOS returns to defaults.
I want benchmarks done with what the manufacturer recommends. I want to know what I can expect from the BOX, not a promise. I don't see the reason to bend to requests of a minority of users, where majority doesn't know what overclocking is.
Scabies - Wednesday, August 14, 2024 - link
Page two, paragraphs one and two.ondma - Thursday, August 15, 2024 - link
I dont even consider AnandTech gaming reviews anymore. They are trash. That being said, AMD itself has said DDR5 6000 is the optimal ram speed for Zen 5. Techspot used that in their tests and the results were no better, about a 1% improvement in gaming for the 13 game average.Oxford Guy - Friday, August 16, 2024 - link
Anandtech used to be a site that pumped CPU-killing levels of voltage into CPUs for overclocking and considered the overclock stable if it didn't crash benchmarks.It changed to be a site that won't use XMP and similar because it's warranty-voiding overclocking.
I was very critical of that but I will say that I am tired of AMD and Intel having their cake and eating it. If AMD is going to tell reviewers the 'sweet spot' is 6000 and the CPUs aren't given a warranty/validation for 6000, then AMD should be told where the plank is to walk from.
erotomania - Tuesday, August 20, 2024 - link
AT writes stories for tech enthusiasts to read, but refuses to test a system equipped like a tech enthusiast would set it up. That's been frustrating for at least a decade. Very un-Anand like. I see their argument, but this isn't consumer retorts.Chaser - Wednesday, August 14, 2024 - link
I think I'm done with this "core parking" nonsense. With the X3D CPUs the CCD parking issue is controlled by the MB's BIOS, the AMD driver, and the Microsoft Game Bar Too many moving parts. And, if you change your AMD X3D CPU to a single CCD or a non X3D CPU you have to reinstall Windows completely to prevent performance degradations.This is primitive nonsense. Intel may run hotter, and be a little slower in gaming, but you get all the cores up to their full TDP regardless of the workload without the V-Cache CCD chaos.
lmcd - Thursday, August 15, 2024 - link
I think the X3D single CCD part is a great product, and an "extreme" edition with 2 X3D CCDs should be made available, but the 1 CCD with 1 without design isn't good.