Hot Chips 2020 Live Blog: Microsoft Xbox Series X System Architecture (6:00pm PT)
by Dr. Ian Cutress on August 17, 2020 9:00 PM EST- Posted in
- CPUs
- Microsoft
- GPUs
- Xbox
- Live Blog
- Xbox Series X
- Hot Chips 32
09:04PM EDT - Final talk of the day is Xbox Series X System Architecture!
09:05PM EDT - Azure Silicon Architecture Team
09:06PM EDT - 3.8 GHz Zen2 Server cores
09:06PM EDT - DXR, VRS, Machine LEarning Acceleration
09:07PM EDT - 14 Gbps GDDR6, 320-bit = 560 GB/s
09:07PM EDT - Hardware accelerators in blue
09:07PM EDT - 120 Hz support, VRR, Xbox Velocity Architecture for MSP Crypto/Decomp on NVMe SSD
09:07PM EDT - Acoustic acceleration
09:07PM EDT - HSP/Pluton RoT - security
09:08PM EDT - 360.4mm2 TSMC N7 enhanced
09:08PM EDT - 15.3B transistors
09:08PM EDT - 2 four core CPU clusters
09:08PM EDT - 10 GDDR6 controllers
09:09PM EDT - GPU 12 FLOPs
09:10PM EDT - AVX256 gives 972 GFLOP over CPU
09:10PM EDT - 16 GB of GDDR6 total
09:11PM EDT - >Says Zen2 server class, but L3 cache is mobile class?
09:11PM EDT - Display processing is kept off the shader engines
09:11PM EDT - IO hub supports PCIe 4.0 x8
09:11PM EDT - Operates on linear light values, not gamma light values
09:12PM EDT - ALLM - Auto Low Latency Mode
09:13PM EDT - Increased die cost on this APU over previous generation
09:13PM EDT - Significantly more expensive!
09:13PM EDT - Trade off
09:13PM EDT - MS created Audio engines - 3 engines, CFPU2, MOVAD, LOGAN
09:13PM EDT - CFPU2 for audio convolution, FFT, reverb
09:14PM EDT - such as Project Acoustics to model 3D audio sources
09:14PM EDT - MOVAD - hyper real-time hardware audio decoder
09:14PM EDT - >300x channels decode at once
09:14PM EDT - best trade off codec, so made in hardware
09:15PM EDT - >100dB signal noise ratio
09:15PM EDT - HW realtime real-time matched to decode based on sampling
09:15PM EDT - Logan is offering also better offload in traditional modes
09:15PM EDT - HSP/Pluton: Root of trust, crypto, SHACK (crypto keys)
09:15PM EDT - MSP supports 5 GB/s high-bw crypto on the SSD
09:16PM EDT - DRAM to SSD balance needed for refill
09:16PM EDT - Load times are always increasing unless SW-to-DRAM bw increases, hence NVMe SSDs
09:17PM EDT - Sampler Feedback System
09:17PM EDT - New metadata for texture portions to pre-load texture caches
09:17PM EDT - Direct Storage
09:17PM EDT - Manages data locations ahead of developer
09:18PM EDT - Distinct savings for most detail texture maps savings
09:18PM EDT - Lossless MS XVA 2:1 compression
09:19PM EDT - Need big GPU - get the tech out of the way
09:19PM EDT - Need raw ops/second increase within PPA and cost
09:19PM EDT - 12.2 supported in HW
09:19PM EDT - 26 active dual CUs (52 CUs)
09:20PM EDT - single geometry supports primatives
09:20PM EDT - DIrectly snoop CPU caches
09:20PM EDT - Dual stream multi-core command processor
09:20PM EDT - Double rate 16-bit math
09:20PM EDT - single cycle issue rate to reduce stalls
09:21PM EDT - CUs have 25% better perf/clock compared to last gen
09:21PM EDT - GPU Evolution: FLOPS have outpaced mem space and BW
09:21PM EDT - Screen pixels has increased in th emiddle
09:22PM EDT - How to fill pixels better without blowing power budget
09:22PM EDT - VRS
09:22PM EDT - supports up to 2x2
09:22PM EDT - 10-30% perf gain for tiny area cost
09:23PM EDT - Full edge detail
09:23PM EDT - SFS
09:24PM EDT - Previously very slow to enable
09:24PM EDT - Two new HW structures for tile-by-tile management for in-DRAM textures
09:25PM EDT - clamps LOD
09:26PM EDT - Tilemaps should stay on die for best latency
09:27PM EDT - SFS: 60% IO/Mem savings for small die area cost
09:27PM EDT - DX Ray Tracing Accel
09:27PM EDT - Not a complete replacement - RT can be applied selectively based on traditional models
09:28PM EDT - Custom ray-triangle units
09:28PM EDT - ML inference
09:29PM EDT - Two virtualized command streams - two VMs
09:29PM EDT - Main title OS vs system OS
09:29PM EDT - 32b HDR rendering, blending display
09:29PM EDT - Optimized games. Unable to show at the event
09:30PM EDT - Q&A Time
09:31PM EDT - Q: TDP? A: Not commenting. There's so many things that are involved in the TDP, and tradeoffs. We're not really able to descibe it without describing it in a technical environemtn
09:32PM EDT - Q: Can you stream into the GPU cache? A: Lots of programmable cache modes. Streaming modes, bypass modes, coherence modes.
09:33PM EDT - Q: Coherency CPU and GPU? A: GPU can snoop CPU, reverse requires software
09:35PM EDT - Q: Are you happy as DX12 as a low hardware API? A: DX12 is very versatile - we have some Xbox specific enhancements that power developers can use. But we try to have consistency between Xbox and PC. Divergence isn't that good. But we work with developers when designing these chips so that their needs are met. Not heard many complains so far (as a silicon person!). We have a SMASH driver model. The games on the binaries implement the hardware layed out data that the GPU eats directly - it's not a HAL layer abstraction. MS also re-writes the driver and smashes it together, we replace that and the firmware in the GPU. It's significantly more efficient than the PC.
09:35PM EDT - Q: Is link between CPU and GPU clocks? A: Hardware is independent.
09:36PM EDT - Q: Is the CPU 3.8 GHz clock a continual or turbo? A: Continual.
09:36PM EDT - Continual to minimize variance
09:37PM EDT - Q: TSMC 7nm enhanced, is it N7P, N7+, or something else? A: It's not base 7nm, it's progressed over time. Lots of work between AMD and TSMC to hit our targets and what we needed
09:38PM EDT - Q: Says Zen 2 is server class, but you use L3 mobile class? A: Yeah our caches are different, but I won't say any more, that's more AMD.
09:39PM EDT - Q: With 20 channels GDDR6, is that really cheaper than 2 stacks HBM? A: We're not religious about which DRAM tech to use. We needed the GPU to have a ton of bandwidth. Lots of channels allows for low latency requests to be serviced. HBM did have an MLC model thought about, but people voted with their feet and JEDEC decided not to go with it.
09:40PM EDT - Q: GDDR6 on sides, not bottom? A: bottom is power, how board interfaces with the chip. GPU has high EDC and currents, and you need clean copper to deliver that. With that much current you need to leave that space unless you use super expensive packaging. We did it the cost efficient way
09:41PM EDT - Q: Why do you need so much math for audio processing? A: 3D positional audio and spatial audio and real world spaces if you 300-400 audio sounds positional in 3D and want to start doing other effects on all samples, it gets very heavy compute. Imagine 20 people fighting in a cave and reflections with all sorts of noises
09:43PM EDT - That's a wrap and we're done for today. Come back tomorrow at 8:30am PT to talk about FPGAs. It's 2:44am here in the UK, time to go to bed.
58 Comments
View All Comments
Spunjji - Wednesday, August 19, 2020 - link
What a bizarrely aggressive response. Ian didn't frame anything as a "gotcha", let alone exhibit hostility. Calm down and slow your roll.close - Thursday, August 20, 2020 - link
Ian's comment: "I think his answer [...] was a massive cop-out and someone disingenuous"Ian's answer when Intel was showing a *clearly* overclocked CPU cooled by a hidden 1HP chiller under the table was to "applaud" and ask no questions. Later on when the gig was up he published a weak retraction/admonishment saying that "this was not communicated as well as it should have been on stage" or "we did tell Intel that we had hoped that the presenter would have spent more time on stage talking about the system in play [...] Our commentary was taken on board by the Intel team we spoke to".
I'd also say that Ian's comment now sounds particularly aggressive, especially given his past reactions to what was clear and deliberate lie/misdirection from Intel. Judge the difference in the aggressiveness of the response yourself. It's the sort of bias that should be highlighted with every opportunity.
https://www.anandtech.com/show/12932/intel-confirm...
https://www.anandtech.com/show/12893/intels-28core...
Meteor2 - Thursday, August 20, 2020 - link
Chill outclose - Friday, August 21, 2020 - link
I'm always chill when I'm 100% right. ;)dotjaz - Sunday, August 23, 2020 - link
What would you call evading such a simple answer? I'd call it a massive cop-out. As Ian said, they didn't even have to give an exact answer, just a range like " package TDP around 100-120W" would suffice. Yet they chose not to disclose at all.close - Sunday, August 30, 2020 - link
@dotjiz: "What would you call evading such a simple answer?"Ian could have said "this was not communicated as well as it should have been on stage". He obviously found it an appropriate answer when it came to Intel's orchestrated deception.
Calling this " a massive cop-out" and more importantly "disingenuous" when it comes to AMD avoiding a straight answer really shines a spotlight on the journalist's bias. And their journalistic integrity.
Spunjji - Friday, August 21, 2020 - link
I'm going to repeat my recommendation. There's no reasonable way to read aggression into what he said - "massive cop-out" is a fair summary of the answer he got and tbh it's fairly neutral language - he didn't say the guy was an ass or a liar, just that his answer dodged the question. Comparing it to the chiller demo is an extended reach.Alexvrb - Sunday, August 23, 2020 - link
In the comment section here, I'm not sure I'd call his statement exactly neutral. He did imply a degree of dishonesty. Here's the part you avoided quoting: "someone disingenuous" I assume he meant somewhat, for the record.With that being said, this isn't as big of a deal as Close is saying IMO. The way I see it, he doesn't hate AMD or MS... he's just not afraid of them. Most journalists are far more worried about pissing off Intel, so they treat them with kid gloves unless *everyone* is going after them on a particular issue. Safety in numbers and all that.
close - Sunday, August 30, 2020 - link
I'm sure that Ian doesn't hate AMD and I am not suggesting he is a shill or on Intel's payroll. But he *is* biased and can't seem to work around it or make an effort to seem more balanced in his texts. The hallmark of a good journalist is keeping their tone equally neutral whether they're talking for or against the things they personally like more or less.And he's obviously treating Intel with gloves in a way I haven't seen any other popular outlet do (like ArsTechnica or GamersNexus). I could give you examples in the dozens of Ian treating Intel with kid gloves from not burning Intel to the stake for making a fool of him (and other journalists) for not spotting obvious signs that other people spotted even if they weren't in the room, to singing some massive praise in reviews to some CPUs that barely brought 5% over the previous gen so the comparison to ones 3-4 generations old was hammered again and again.
close - Sunday, August 30, 2020 - link
@Spunkjji: "he didn't say the guy was an ass or a liar"We must have different definitions of calling someone's statements "disingenuous". It literally means insincere. And I understand a random on the comment section may not have a solid grasp of... words. But a journalist has no such excuse. Again, he called an orchestrated deception as "not communicated as well as it should have been" and someone not wanting to give a detailed answer after a slide deck as "a cop-out" and "disingenuous".
So then I can just say Ian's way of expressing his unhappiness towards AMD compared to Intel is *extremely* disingenuous. I guess you don't develop integrity no matter how many titles you slap in front of your name.