A Physical Register File

Just like AMD announced in its Bobcat and Bulldozer architectures, in Sandy Bridge Intel moves to a physical register file. In Core 2 and Nehalem, every micro-op had a copy of every operand that it needed. This meant the out-of-order execution hardware (scheduler/reorder buffer/associated queues) had to be much larger as it needed to accommodate the micro-ops as well as their associated data. Back in the Core Duo days that was 80-bits of data. When Intel implemented SSE, the burden grew to 128-bits. With AVX however we now have potentially 256-bit operands associated with each instruction, and the amount that the scheduling/reordering hardware would have to grow to support the AVX execution hardware Intel wanted to enable was too much.

A physical register file stores micro-op operands in the register file; as the micro-op travels down the OoO engine it only carries pointers to its operands and not the data itself. This significantly reduces the power of the out of order execution hardware (moving large amounts of data around a chip eats tons of power), it also reduces die area further down the pipe. The die savings are translated into a larger out of order window.

The die area savings are key as they enable one of Sandy Bridge’s major innovations: AVX performance.

The AVX instructions support 256-bit operands, which as you can guess can eat up quite a bit of die area. The move to a physical register file enabled Intel to increase OoO buffers to properly feed a higher throughput floating point engine. Intel clearly believes in AVX as it extended all of its SIMD units to 256-bit wide. The extension is done at minimal die expense. Nehalem has three execution ports and three stacks of execution units:

Sandy Bridge allows 256-bit AVX instructions to borrow 128-bits of the integer SIMD datapath. This minimizes the impact of AVX on the execution die area while enabling twice the FP throughput, you get two 256-bit AVX operations per clock (+ one 256-bit AVX load).

Granted you can’t mix 256-bit AVX and 128-bit integer SSE ops, however remember SNB now has larger buffers to help extract more ILP.

The upper 128-bits of the execution hardware and paths are power gated. Standard 128-bit SSE operations will not incur an additional power penalty as a result of Intel’s 256-bit expansion.

AMD sees AVX support in a different light than Intel. Bulldozer features two 128-bit SSE paths that can be combined for 256-bit AVX operations. Compared to an 8-core Bulldozer a 4-core Sandy Bridge has twice the 256-bit AVX throughput. Whether or not this is an issue going forward really depends on how well AVX is used in applications.

The improvements to Sandy Bridge’s FP performance increase the demands on the load/store units. In Nehalem/Westmere you had three LS ports: load, store address and store data.

In SNB, the load and store address ports are now symmetric so each port can service a load or store address. This doubles the load bandwidth which is important as Intel doubled the peak floating point performance in Sandy Bridge.

There are some integer execution improvements in Sandy Bridge, although they are more limited. Add with carry (ADC) instruction throughput is doubled, while large scale multiplies (64 * 64) see a ~25% speedup.

The Front End The Ring Bus & System Agent
Comments Locked

62 Comments

View All Comments

  • iwodo - Tuesday, September 14, 2010 - link

    Many questions still not answered, may be Anand could found out for us.

    1. Were the GPU performance we saw from 6 EU or 12 EU?
    2. Where is FMA ( Fused Multiply Add ) ? Will we see it in Ivy Bridge?
    3. Can All software developers access the Decoding Engine? We could see many codec being optimized for playback on Intel Hardware Decoder, whether it is fully supported codec or partially supported codec.
    4. Hardware Encoder? It is Full Hardware encoder? Free to use for Software Dev?
    5. OpenCL not possible?
    6. How many % die size is given to Graphics?
    7. Gfx Drivers, will Intel commit more resources on drivers update? Or Will they open sources it?

    Apart from Sandy Bridge, Looking forward for reports on USB 3.0 situations, LightPeak, Gen 3 SSD.
  • trivik12 - Tuesday, September 14, 2010 - link

    1) I believe it was 12EU part.
    2) FMA will be introduced with Haswell(next tock). So we have to wait until early 2013 for that.
  • Foo999 - Tuesday, September 14, 2010 - link

    > 2. Where is FMA ( Fused Multiply Add ) ? Will we see it in Ivy Bridge?

    You can check out the full current (and Ivy Bridge) AVX instructions in the AVX reference manual available from software.intel.com/en-us/avx/
  • spart - Tuesday, September 14, 2010 - link

    1 , 6UE The 12 is only for laptops and high ranges
  • gvaley - Tuesday, September 14, 2010 - link

    So, was it playable, I mean Starcraft II?
  • therealnickdanger - Tuesday, September 14, 2010 - link

    Yeah, the caption said "310M vs Sandy Bridge" so I assume you could see the settings and frames per second. Details, man, details!!

    :)
  • Anand Lal Shimpi - Tuesday, September 14, 2010 - link

    Yes, it was playable at medium quality settings. They only had the single player campaign running however.

    Take care,
    Anand
  • Carleh - Tuesday, September 14, 2010 - link

    With BCLK locked, where does that leave the motherboard manufacturers?
    I mean, what are they left to offer to enthusiasts, if the BCLK is locked? How are they going to differentiate an enthusiast-class motherboard from a mainstream one?
  • ssj4Gogeta - Tuesday, September 14, 2010 - link

    Will they be locking the socket 2011 parts as well?
  • Zoomer - Sunday, September 19, 2010 - link

    Sell more bullbozer boards. I was all set to be ready to get a nice Sandy Bridge and overclock it to hell, but now I think I'll get a bulldozer instead.

    Sure there's the K, but it costs more. That kinda defeats the point, unless the aim is to get a high clk for epeen.

Log in

Don't have an account? Sign up now