Core-to-Core Latency: Zen 5 Gets Weird

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

Looking at the above latency matrix of the Ryzen 9 9950X, we observe that the lowest latencies naturally occur between adjacent cores on the same CCX. The core pairs such as 0-1, 1-2, and 2-3 consistently show latencies in the 18.6 to 20.5 nanoseconds range. This is indicative of the fast L3 cache shared within the CCX, which ensures rapid communication between the inner cores on the same complex.

Compared to the Ryzen 9 7950X, we are seeing a slight increase in latencies within a single CCX. The SMT "advantage", where two logical cores sharing a single physical core have a lower latency, appears to be gone. Instead, latencies are consistently around 20ns from any logical core to any other logical core within a single CCX. That average is slightly up from 18ns on the 7950X, though it's not clear what the chief contributing factor is.

More significantly – and worryingly so – are the inter-CCD latencies. That is, the latency to go from a core on one CCD to a core on the other CCD. AMD's multi-CCD Ryzen designs have always taken a penalty here, as communicating between different CCDs means taking a long trek through AMD's Infinity Fabric to the IOD and back out to the other CCD. But the inter-CCD latencies are much higher here than we were expecting.

For reference, on the Ryzen 9 7950X, going to another CCD is around 76ns. But in Ryzen 9 9950X, we're seeing an average latency of 180ns, over twice the cost of the previous generation of Ryzen. Making this all the more confusing, Granite Ridge (desktop Ryzen 9000) reuses the same IOD and Infinity Fabric configuration as Raphael (Ryzen 7000) – all AMD has done is swap out the Zen 4 CCDs for Zen 5 CCDs. So by all expectations, we should not be seeing significantly higher inter-CCD latency here.

Our current working theory is that this is a side-effect of AMD's core parking changes for Ryzen 9000. That cores are being aggressively put to sleep, and that as a result, it's taking an extra 100ns to wake them up. If that is correct, then our core-to-core latency test is just about the worst case scenario for that strategy, as it's sending data between cores in short bursts, rather than running a sustained workload that keeps the cores alive over the long-haul.

At this point, we're running some additional tests on the 9950X without AMD's PPM provisioning driver installed, to see if that's having an impact. Otherwise, these high latencies, if accurate for all workloads, would represent a significant problem for multi-threaded workloads that straddle the Infinity Fabric.

Test Bed & A Note on Raptor Lake Woes Power Consumption
Comments Locked

123 Comments

View All Comments

  • coburn_c - Wednesday, August 14, 2024 - link

    The clocks make this a hard pass with X3D coming.
  • boozed - Thursday, August 15, 2024 - link

    Very interesting. I think I probably already know the answer but was comment on the issues you found sought from AMD prior to publication, and if so was there a response?

    Looks like I too will be waiting for the next single-CCD X3D.
  • trivik12 - Thursday, August 15, 2024 - link

    Only huge boost is with AVX-512 loads which is useless for real life applications on client side. Otherwise Zen 5 has been the most meh upgrade from AMD in a while. its not shocking as there is only so much one can do to boost IPC and performance.

    Let us wait for Turin review and comp with Granite Rapids. That should be more interesting.

    On client side Apple M series is the boss. Their IPC is so much better and higher performance despite lower clockspeeds. I hope X Elite 2 on N3P can produce something competitive. I am pessimistic on x86 chips coming close to Apple.
  • Oxford Guy - Saturday, August 17, 2024 - link

    'Only huge boost is with AVX-512 loads which is useless for real life applications on client side.'

    Not really, although it doesn't help that Intel wasn't consistent with the deployment of that feature.
  • ondma - Thursday, August 15, 2024 - link

    Wow, the AMD apologists are out in full force. Reminds me of the Bulldozer days: "It's a great chip, just wait till the software catches up." Remember, though, these are consumer chips, not enterprise/server, so for the market it will be sold to, there is basically no improvement over the previous generation. Very disappointing release from AMD, and I don't expect much improvement performance wise for Arrow Lake either, except hopefully they solved the stability issues and lower power consumption. Even worse, it looks like another 2 years before a new desktop lineup from either manufacturer (Zen 6 or Nova Lake). So there is a good chance by 2026/27 we could have gone 4 years without significant performance increases for consumer desktop. Yikes!!
  • evanh - Friday, August 16, 2024 - link

    The IPC gains are really there if you filter out the overreactions of the reviewers themselves. The real question is how come that isn't translating to gains across the board like it normally would. Games in particular normally love a boost to IPC. There's a mystery to be solved here.
  • Oxford Guy - Friday, August 16, 2024 - link

    'Reminds me of the Bulldozer days'

    Ridiculous hyperbole. These CPUs are highly competitive, unlike Bulldozer.
  • GeoffreyA - Sunday, August 18, 2024 - link

    There's a narrative going on that Zen 5 is a failure.
  • Oxford Guy - Thursday, August 22, 2024 - link

    It's not a failure. It may not be terribly impressive but it's not a failure. Bulldozer was a failure.
  • GeoffreyA - Friday, August 23, 2024 - link

    Yes. Once the scheduler issues are fixed, or those relating to the admin account, whatever they are causing, branch prediction or otherwise, Zen 5 will be shown in a truer light. It is an excellent series bogged down by problems on the OS side.

Log in

Don't have an account? Sign up now