Arm Announces Neoverse V1, N2 Platforms & CPUs, CMN-700 Mesh: More Performance, More Cores, More Flexibility
by Andrei Frumusanu on April 27, 2021 9:00 AM EST- Posted in
- CPUs
- Arm
- Servers
- Infrastructure
- Neoverse N1
- Neoverse V1
- Neoverse N2
- CMN-700
The Neoverse V1 Microarchitecture: Platform Enhancements
Aside from the core-side microarchitectural aspects of the V1, the new design also features some new system-facing novelties that promise to help vendors integrate the CPU IP better in larger scale implementations.
MPAM, or Max Power Mitigation Mechanism is a new fine-grained (to around 100 clock cycles) power management mechanism that promises to help smooth out the power behaviour of the core, and allow vendors’ implementations of the chip’s power delivery mechanisms to be so to say, be built to lesser requirements.
As we’ve seen in our review of the Ampere Altra, instead of fluctuating frequency at maximum TDP like how most x86 CPUs behave right now, the chip rather prefers to stay most of the time at maximum frequency, with the actual power consumption many times landing in at quite below the TDP (maximum allowed power consumption). A mechanism such as MPAM would allow, if possible, for the system’s average frequency to be higher by throttling the power limited cores to a finer degree. The mechanism to which this can be achieved can also include microarchitectural features such as dispatch throttling where the core slows down the dispatched instructions, smoothing out high power requirements in workloads having high execution periods, particularly important now with the new wider 2x256b SVE pipelines for example.
MPAM is a different mechanism helping interactions in larger system implementations. The Memory partitioning and monitoring feature is supposed to help with quality of service and reducing side-effects of noisy neighbours in deployments where multiple workloads, such as multiple VMs or processes, operate on the same system. This naturally requires software-hardware cooperation and implementation, but should be something that is particularly helpful in cloud environments.
CBusy or Completer Busy is also a new system-side mechanism where the CPU cores interact with the mesh interconnect on a feedback-based basis, where the CPUs can vary their memory prefetcher aggressiveness depending on the overall mesh and system memory load. This ties in with the previously mentioned dynamic prefetcher behaviour where one can have the best of both worlds – better prefetching for more performance per core when the bandwidth is available, and very conservative prefetching when the system is under high load and there’s no room for wasted speculative bandwidth and data transfers.
95 Comments
View All Comments
mode_13h - Tuesday, April 27, 2021 - link
> sample in the second half of 2022Uh, that means new machines won't be using them until at least the end of next year. And if we want more cores than an ultraportable, it's still no good.
Raqia - Wednesday, April 28, 2021 - link
I wouldn't put it past them to do a desktop or server sized SoC eventually if they have a great in house core design that isn't a commoditized IP block that anyone can license from ARM. It would give them an advantage at the higher tiers of performance that they will want piece of for sure.They also seem to be devoted to providing an open ARM computing platform in working with Linux developers and Windows when compared with Apple. That they added a hypervisor to the 888 should give you some indication to their future compute ambitions...
mode_13h - Wednesday, April 28, 2021 - link
> I wouldn't put it past them to do a desktop or server sized SoCThe already tried this, but their investors killed it. Lookup "Centriq". Building out a whole server infrastructure & ecosystem takes a lot of investment, and now they'd have established competitors with a multi-year lead.
Raqia - Wednesday, April 28, 2021 - link
I wasn't talking about servers (at least not right away), more consumer oriented and workstation scale compute. Amon did say that the designs they had in mind with Nuvia were "scalable" and that they were going to be addressing multiple markets.mode_13h - Wednesday, April 28, 2021 - link
I hope you're right. If anyone can compete with Apple right now, it's probably Nuvia/Qualcomm.name99 - Thursday, April 29, 2021 - link
You need three things to create a higher performance core than Apple- designers (check)
- an implementation team (hmm. maybe? this means *enough* good people and superb simulation/design tools)
- management willing to pay the costs [design costs, and willing to accept a substantially larger core] (hmmmmmmmm? will they chicken out and assume no-one is willing to pay for such a core, they way they always have for watch, phone, then centriq?)
And Apple won't stand still...
mode_13h - Tuesday, April 27, 2021 - link
> so far except the HPE's A64FXGigabyte makes Altra motherboards and servers that I'm sure you can buy for less than a HPE A64FX-based machine.
And, if you're counting A64FX as a "consumer machine", you ought to include Avantek's Altra-based workstations that I mentioned below.
mode_13h - Tuesday, April 27, 2021 - link
> if these CPUs outperform the EPYC Milan technically AWS should replace all of them right ?No, because a lot of people are still stuck on x86. Also, Amazon could be fab-limited, like just about everyone else. The sun might be setting on x86, but it's still a long time until dark.
Rudde - Tuesday, April 27, 2021 - link
An Avantek Ampere workstation might be available in a stand-alone system. Andrei expects Ampere to include N2 in their next gen systems instead of V1. Apple might also launch something in that segment in the coming years.mode_13h - Tuesday, April 27, 2021 - link
A UK-based company called Avantek makes Ampere-based workstations. Their eMAG-based version was reviewed on this site, a couple years ago, and they now have one with Altra. So, I'd say better than average chances we might see one with a V1-based CPU by maybe the end of the year or so.