The AMD Ryzen 9 9950X and Ryzen 9 9900X Review: Flagship Zen 5 Soars - and Stalls
by Gavin Bonshor on August 14, 2024 9:00 AM EST- Posted in
- CPUs
- AMD
- Desktop
- Zen 5
- AM5
- Ryzen 9000
- Ryzen 9 9950X
- Ryzen 9 9900X
Core-to-Core Latency: Zen 5 Gets Weird
As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.
But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.
If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.
Looking at the above latency matrix of the Ryzen 9 9950X, we observe that the lowest latencies naturally occur between adjacent cores on the same CCX. The core pairs such as 0-1, 1-2, and 2-3 consistently show latencies in the 18.6 to 20.5 nanoseconds range. This is indicative of the fast L3 cache shared within the CCX, which ensures rapid communication between the inner cores on the same complex.
Compared to the Ryzen 9 7950X, we are seeing a slight increase in latencies within a single CCX. The SMT "advantage", where two logical cores sharing a single physical core have a lower latency, appears to be gone. Instead, latencies are consistently around 20ns from any logical core to any other logical core within a single CCX. That average is slightly up from 18ns on the 7950X, though it's not clear what the chief contributing factor is.
More significantly – and worryingly so – are the inter-CCD latencies. That is, the latency to go from a core on one CCD to a core on the other CCD. AMD's multi-CCD Ryzen designs have always taken a penalty here, as communicating between different CCDs means taking a long trek through AMD's Infinity Fabric to the IOD and back out to the other CCD. But the inter-CCD latencies are much higher here than we were expecting.
For reference, on the Ryzen 9 7950X, going to another CCD is around 76ns. But in Ryzen 9 9950X, we're seeing an average latency of 180ns, over twice the cost of the previous generation of Ryzen. Making this all the more confusing, Granite Ridge (desktop Ryzen 9000) reuses the same IOD and Infinity Fabric configuration as Raphael (Ryzen 7000) – all AMD has done is swap out the Zen 4 CCDs for Zen 5 CCDs. So by all expectations, we should not be seeing significantly higher inter-CCD latency here.
Our current working theory is that this is a side-effect of AMD's core parking changes for Ryzen 9000. That cores are being aggressively put to sleep, and that as a result, it's taking an extra 100ns to wake them up. If that is correct, then our core-to-core latency test is just about the worst case scenario for that strategy, as it's sending data between cores in short bursts, rather than running a sustained workload that keeps the cores alive over the long-haul.
At this point, we're running some additional tests on the 9950X without AMD's PPM provisioning driver installed, to see if that's having an impact. Otherwise, these high latencies, if accurate for all workloads, would represent a significant problem for multi-threaded workloads that straddle the Infinity Fabric.
123 Comments
View All Comments
Bruzzone - Wednesday, August 21, 2024 - link
Note I have reworked this on missing a tier of distribution.Why $650 for 9950X and the answer is channel mark up and to cost loss offset Raphael R7K and Vermeer R5K discounting to clear channel inventories.
.
9950X = $216 from TSMC and TSMC makes approximately x3 over cost.
AMD 9950X high volume price is $325 suspect 3 M unit procurement estimated on AMD top 10 customers divided into suspect full run production. This high volume price supports OEM resale to secondary distributors. The mid volume price is $487 that would include volume retail. Low volume tray of 10, $1K < 10% and where MSRP = $649. So, a mid-volume buyer can make + 33%
9950X = $216, $325, $487, $649
9900X = $166, $250, $375, $499
9700X = $120, $180, $269, $359
9600X = $93 (about $28 over TSMC cost), $140, $209, $279
10 OEMs on a 3 million unit procurement $223 to $289 with a sliding discount on 9600X down to AMD cost for meeting their full run contract sales objective.
This appears to be how AMD product’s total revenue potential is dived between the primary stakeholders,
TSMC takes 28.25%
AMD takes 49% < costs where R&D is variable and earns net after tax.
Top 10 customers as PC suppliers and CPU master distributors take up to 22.6%
mb
ballsystemlord - Thursday, August 22, 2024 - link
Thanks!Bruzzone - Friday, August 23, 2024 - link
You're welcome, anytime, your inquiry caused me to consider the data more thoroughly. mbSilver5urfer - Tuesday, August 20, 2024 - link
Now that the dust has settled.I really appreciate AT reviews. And as always again, AT did a great job over the stupid HWU and GN's subpar reviews. Esp the videos on YT. Anandtech shows both the IPC gains in SPEC score but also translation failure on Windows, specifically mentioned how PPM ruins the CPUs. Most of the YT content is focusing on the here's the review guidelines and here's the game tests and here's some special workloads which they use a set of rendering techniques in Blender etc.
But Phoronix and AT have consistent written articles, AT here clearly showing the CCD interdie latency, that's the culprit here. AMD's poor choice of re-using the Zen 4 IODie is causing this alongside lack of IMC improvement. Many circulate BS such as AMD is using Mesh vs Ring, it's nonsense. The thing is AMD's CCD design since Zen 3 is using a hybrid of mesh and ring bus. Check Ian's article on that. So that's not the issue unless Zen 5 exclusively changed something big, which I doubt since this is just a revision of Zen 4.
RKL also had IPC gain but showed regression due to 14nm++ backport and 2C4T deficit and IMC regression. IPC SPEC does not always translate, AMD's major screw up is relying on a Software scheduler on top of the rehashed Zen 4 design causing this massive confusion.
Next is Power. This is never mentioned, the thing is AMD with Zen 5 went super conservative for some stupid reason and ruined the CPU boost and base clocks on all SKUs. Even the top bin dual CCD 9950X got that regression in base clocks. Basically AMD axed the TDP of the Zen 5 CPUs to match the Zen 4 lineup and killed the performance on these. I have always wondered why AM5 boards have tons of VRM but no CPU to utilize them, any X670E board VRM is capable of delivering over 350-400W of power to CPU but the AM5 PPT / Socket max is 250W top, maybe 270W if you push to extreme, so they are hard capped unlike Intel. AMD should have increased the AM5's socket power band to 300W to let them boost and unlock more performance. The 9700X is a massive L due to this huge power cut from 105W to 65W, they also nerfed 12C 9900X from 175W to 120W.
Also perhaps AMD wanted Zen 6 to shine brighter just like Intel Alder Lake, Intel killed LGA1200 with garbage RKL release and EOLed it to make the ADL look massive. Not that nefarious but in some extent AMD seems to take a page out of Intel. Esp given the fact on Intel's ARL lacking Hyperthreading cores and loss of Clockspeed from Raptor Lake (Disaster) 6.2GHz. And that gives AM5 to have Zen 6 with huge boost over Zen 4, 100% sure that Zen 6 will get new IODie and newer TSMC big node jump atop maybe a new chipset.
All in all AMD's Zen 5 is a real dissapointment, since AM4 triumph. AMD always delivered lot of performance all the way from Zen -> Zen + improving CCD, Clockspeed, IMC -> Zen 2 lot of changes to the CCD and IODie -> Zen 3 a totally new design and radical departure of NUMA system plus higher boost clocks -> Zen 4 decoupled a lot of baggage on IF links from Uclock / Mclock / IFclock to just 2 links as Uclock and Mclock and massive Clockspeed boost and super stable platform unlike Zen 3 / AM4's USB I/O issues, there also the GloFo's IODie caused a havoc, here its the not stable amalgamation of the older IODie with newer Zen 5 core. Turin will shatter performance because it won't be sandbagged by this weakpoint.
Shame since AMD lost a glorious chance to completely ruin Intel (They deserve at this point, killed Optane, ruined CPU desktop arena with Big little junk, absolutely insane California policies adoption, total disaster in 10nm delivery, hamfisting LGA1200 socket, CPU bending on LGA1700, Raptor lake failure... unending list, such as Gloo / NSO spyware / Unit 8200 sponsorship from Pat Gelsinger), AMD would have destroyed Intel but they chose not to.
That said if ARL performs better than Zen 5, that shame AMD will face would be totally deserving for ruining Zen5.
GeoffreyA - Tuesday, August 20, 2024 - link
I also think the reusing of Zen 4's I/O die is causing the latency issues. Something is suboptimal somewhere along line.As for the lowering of power, they certainly seem to have ample headroom to raise it, gaining performance. Perhaps they thought that Zen 4 was using too much and wanted to curtail it, to contrast favourably with Intel. Lowering power while raising IPC always pays off later. Perhaps it's got to do with the I/O die. Maybe they're laying the foundation for further widening of the core and reduction in frequency. Indeed, they made the decoder a two-by-four cluster but it is not doing much in Zen 5, diminished by the effect of the micro-op cache. Or perhaps they've made a chain of poor decisions.
AnitaPeterson - Thursday, August 22, 2024 - link
What I'd like to see now is a direct comparison to AM4.Specifically, to the AM4 SKUs that were most recently launched - from the 5700x 3D to the 5900XT.
Because AM4 is mentioned in the conclusion, but there's no direct testing to pit the two generations against each other.
jcc5169 - Thursday, August 22, 2024 - link
Are you going to publish AMD's response?jcc5169 - Friday, August 23, 2024 - link
Of course not. I wonder if Intel compensated the writer or this site.AnitaPeterson - Friday, August 23, 2024 - link
Which AMD response would that be? Genuinely curious.GeoffreyA - Friday, August 23, 2024 - link
https://community.amd.com/t5/gaming/ryzen-9000-ser...