Google's Tensor inside of Pixel 6, Pixel 6 Pro: A Look into Performance & Efficiencyby Andrei Frumusanu on November 2, 2021 8:00 AM EST
- Posted in
- Pixel 6
- Pixel 6 Pro
- Google Tensor
It’s been about two weeks since Google officially announced their newest flagship devices in the form of the Pixel 6, and Pixel 6 Pro. The two new Pixel phones are inarguably Google’s largest shift ever since the Pixel series was introduced, showcasing major changes in essentially every aspect of the devices, sharing very little in common with their predecessors besides the Pixel name. Featuring brand new displays, camera systems, body designs, and internal hardware at seemingly extremely competitive pricing, the phones seem to be off to an extremely good start and competitive positioning Google hasn’t had in a long time.
One of the biggest changes, and most interesting to our readers, is the fact that the Pixel 6 and Pixel 6 Pro come powered on by Google’s own “Tensor” SoC. And it’s here where there’s quite a bit of confusion as to what exactly the Tensor is. Google explains that the Tensor is Google’s start in a journey towards the quest of enabling new kinds of workloads, which in the company’s words, were simply not possible or achievable with “standard” merchant silicon solutions. Taking advantage of Google research’s years of machine learning experience, it’s a chip that’s heavily focused towards ML as its primary differentiating feature, and what is said to allow the Pixel 6 phones to have many of the new unique feature exclusive to them.
Today, we’re giving the Tensor SoC a closer look. This includes trying to document what exactly it’s composed of, showcasing the differences or similarities between other SoCs in the market, and better understanding what kind of IPs Google has integrated into the chip to make it unique and warrant calling it a Google SoC.
The Chip Provenance
Officially, per Google’s own materials, the Tensor is a Google SoC fully designed by the company. And while the overall truth of this will vary based on your definition of “design”, the chip follows a seemingly close cooperation between Google and Samsung LSI, in the process blurring the lines between a traditional custom design and semi-custom design-for-hire chips such AMD’s console APUs.
Starting off at the very highest level, we have the actual name of the SoC. “Google Tensor” is quite abstract in that, for the time being, the chip doesn’t have any particular model number attached to it in terms of official marketing. So whether the next-gen will be marketed “Tensor 2” or something else will remain to be seen. Internally, Google calls the chip the “GS101”, and while I’m not entirely sure here what GS stands for, it’s likely Google SoC or Google Silicon. For quite some time now we’ve also heard the “Whitechapel” being reported, although I’ve seen no evidence that this was a reference to the actual chip but in the very early stages.
On the silicon side, the chip has another model number, with the SoC’s fused chip identification following Samsung’s Exynos naming scheme. Here we find the chip has an ID of “0x09845000”, which corresponds to what would be S5E9845 (Edit: It's actually S5P9845). The latest Samsung LSI SoC, for reference, is the Exynos 2100, which is identified as the S5E9840.
Of course, why would the Google SoC follow an Exynos internal naming scheme? That’s where we can begin to see some of the provenance of the design. It’s been widely reported for some time that a few years back, Samsung opened up itself to semi-custom silicon design offers. A piece from August 2020 from ETNews seems to correctly describe Samsung’s business plan and how it pertains to the Google chip (as well as describing a Cisco design win):
“Samsung Electronics is set to manufacture semiconductor chips for Cisco Systems, which is the top network equipment maker in the world, and Google and it is responsible for the entire semiconductor manufacturing process from design to production.
Samsung Electronics is currently working on a development phase that involves chip design.
Samsung Electronics also obtained an order from Google regarding manufacturing of more than one chip. It is heard that Google requested a semiconductor that will go into a sensor that can measure body movements rather than for processors that go into current IT devices and an unprecedented application processor (AP).
Samsung Electronics is carrying out a different approach as it looks to actively utilize its technologies in chip design. Its strategy is to provide “customized” technologies and features that its customer needs even from a design stage and secure consignment production as well.
What’s important here is the latter description of the process – where rather than simply acting as a pure-play contract manufacturer, Samsung is acting as a fully engaged party in the design of the silicon. This could very much be compared to an ASIC design service, with the exception being that Samsung is also a merchant SoC vendor as well as a manufacturer for the silicon, something that’s quite unique in the industry, and thus something of a special situation.
Having the chip in our hands now, as well as having the open-source insight into the characteristics of it, we can start breaking down what exactly the Google Tensor is:
|Google Tensor and Samsung Exynos 2100: Similar But Different|
@ 2.80GHz 2x1024KB pL2
@ 2.25GHz 2x256KB pL2
@ 1.80GHz 4x128KB pL2
@ 2.91GHz 1x512KB pL2
@ 2.81GHz 3x512KB pL2
@ 2.20GHz 4x64KB pL2
|GPU||Mali G78 MP20 @
848 MHz (shaders)
996 MHz (tiler / L2)
|Mali G78 MP14 @
|4x 16-bit CH
@ 3200MHz LPDDR5 / 51.2GB/s
8MB System Cache
|ISP||Hybrid Exynos + Google ISP||Full Exynos ISP Blocks
|Media||Samsung Multi-Function Codec
8K30 & 4K120 encode &
H.265/HEVC, H.264, VP9
4K60 AV1 Decode
|Modem||Exynos Modem 5123
(LTE Category 24/18)
(5G NR Sub-6)
(5G NR mmWave)
|Exynos Modem 5123
(LTE Category 24/18)
(5G NR Sub-6)
(5G NR mmWave)
Same Blood Type
In the very fundamentals of what an SoC is, the Google Tensor closely follows Samsung’s Exynos SoC series. Beyond the usual high-level blocks that people tend to talk about in an SoC, such as CPUs, GPUs, NPUs, and other main characteristics, there’s the foundational blocks of a chip: these are the fabric blocks and IP, the clock management architecture, power management architecture, and the design methodology of the implementing those pieces into actual silicon. While on paper, a Samsung Exynos, a MediaTek Dimensity or a HiSilicon Kirin, or even a Qualcomm Snapdragon (on the CPU side) might have similar designs in terms of specifications – with the same high-level IP such as Cortex CPU or Mali GPUs from Arm – the chips will still end up behaving and performing differently because of the underlying SoC architecture is very different.
In the case of the Tensor, this “chassis” builds upon the IP Samsung uses on their Exynos SoCs, utilizing the same clock management and power management architecture. Going further up in the IP hierarchy we find additional similarities among high-level IP blocks, such as memory controllers, fabric IP, PHY IP for all kinds of externally facing interfaces, and even the larger IP functional blocks such as ISP or media decoders/encoders. The fun thing is that these things are now publicly scrutinizeable, and can be compared 1:1 to other Exynos SoCs in terms of their structures.
This leads us to Google’s claim of the Tensor being their own design – which is true to an extent, but how true that is can vary based on your definition of “design” and how in-depth you want to go with that. Although the Tensor/GS101 builds upon Exynos foundational blocks and IPs – and likely was even integrated and taped-out by Samsung – the definition of the SoC is in Google’s control, as it is their end-product. While things are very similar to an Exynos 2100 when it comes to Tensor’s foundation and lowest level blocks, when it comes to the fabric and internal interconnects Google’s design is built differently. This means that the spiderweb of how the various IP blocks interact with each other is different from Samsung’s own SoC.
A practical example of this is how the CPU cores are integrated into the SoC. While on the Exynos 2100 the CPU cluster seemingly lies very clearly in a smaller, more defined Samsung Coherent Interconnect, the Tensor SoC integrates the CPU clusters in a larger CCI that appears to either be a very different configuration of the interconnect setup, or is a different IP altogether. Meanwhile there are still some similarities, such as having one predominant memory traffic bus connected to the memory controllers and one other lower-traffic “internal” bus for other IPs, which is how Exynos SoCs tend to separate things. It should be possible to reverse-engineer and map out the SoC in more detail, however that’s a time-consuming matter out of the scope of this piece.
The CPU Setup - 2x X1 + 2x A76 + 4x A55
While we could go on and on talking about SoC architecture, let’s curtail that for now and jump into the more visible and practical differences of the Tenor SoC, starting off with the CPU cluster.
Google’s CPU setup is quite unusual from other SoCs in that it features a 2+2+4 configuration. While this isn’t truly exceptional – Samsung had this very same setup for the Exynos 9820 and Exynos 990 – the X1+A76+A55 configuration on the Tensor is currently unique in the market. Most other vendors and implementations out there have shifted over to a 1+3+4 big+mid+little CPU configurations.
On the Cortex-X1 side, Google’s use of a pair of cores means that, in theory, the performance of the chip with two heavy threads should be higher than any other Android SoC which only have a single big large performance core. The frequencies of the X1 pair come in at 2.8GHz, slightly lower than the 2.86GHz of the Snapdragon 888 and 2.91GHz of the Exynos 2100 X1 cores. Google equipped the cores with the full 1MB of L2 cache, similar to the S888 and double that of the E2100 configuration.
As for the middle cores, Google has employed Cortex-A76 cores, which has been a hot topic for discussion. At first glance, it’s seemingly a bit irrational considering both the Cortex-A77 and A78 offer higher performance and higher energy efficiency. The cores are clocked at 2.25GHz and come with 256KB of L2. We haven’t received a clear explanation from Google as to why they used the A76, but I do think it’s likely that at the time of design of the chip, Samsung didn’t have newer IP ready for integration. The chip has been brewing for some time and while it does feature X1 cores, maybe it was too late in the process to also shift over to newer middle cores. I do not think there was a purposeful choice of using A76 cores instead of A78, since as we’ll see in our performance benchmarks that the older design underperforms.
On the little cores, there are 4x A55 cores at 1.8GHz. In contrast to Samsung’s own Exynos chips, Google has decided to equip the cores with 128KB of L2 caches rather than just 64KB, so they’re more in line with the Snapdragon 888 configuration. One odder choice from Google is that the L3 cache of the cluster is on the same clock plane as the A55 cores, which has latency and power implications. It’s also at odds with the dedicated L3 clock plane we see on the Exynos 2100.
Another Fat Mali GPU: G78MP20 At High Clocks
Earlier rumors about the SoC indicated that it would come with a Mali-G78 generation GPU, however we didn’t know the exact core count or clocks of the design. Google has since confirmed the MP20 configuration, which is the second-largest Mali GPU configuration, behind only the Kirin 9000 and its massive 24-core unit. I had initially theorized that Google was likely running the GPU at low frequencies to be able to optimize for energy efficiency, only to end up rather shocked to see that they’re still running the GPU at a peak clockspeed of 848MHz for the shader cores, and 996MHz for the tiler and L2. The Google Tensor, if I’m not mistaken, seems to be the first confirmed G78 implementation actually taking advantage of Arm’s split clock plane design of the G78, which allows the shared GPU fabric to run at a higher frequency than the actual shader cores – and hence why it has two frequencies.
The actual frequencies are extremely high. The Exynos 2100’s G78MP14 already ran at 854MHz, and it was a chip which we deemed to have very high peak power figures; but here Google is adding 42% more cores and is not backing down on frequency. So that’s very eye-brow raising and concerning in terms of peak GPU power, concerns which we’ll see materialize in the latter GPU evaluation section.
LPDDR5, 8MB SLC Cache
The memory controllers on the Google Tensor appear to be the same as on the Exynos 2100, supporting LPDDR5 in a 4x 16-bit channel configuration for a total peak theoretical bandwidth of 51.2GB/s.
Google also integrated 8MB of system cache, and for me it isn’t exactly clear if this is the same IP Samsung uses on the Exynos 2100. Seemingly they’re both 8MB, but I’m leaning towards saying that it’s a different IP, or at the very least a different version of the IP, as there are some real differences in the way it’s architected and how it behaves.
Google here makes very extensive usage of the SLC for improving the performance of the SoC blocks, including their own custom blocks. The SLC allows itself to be partitioned and to dedicate SRAM regions to particular IP blocks on the SoC, giving them exclusive access to all or parts of the cache in varying use-case situations.
A Custom Hybrid ISP Pipeline
Usually when people or companies talk about SoC ISPs, these are always depicted as being a single monolithic IP block. In reality what we call an “ISP” is a combination of different specialized IP blocks, each handling different tasks in what we call the imaging pipeline. The Google Tensor here is interesting in that it takes bits and pieces of what Samsung uses on their Exynos chips, and also integrates custom Google-developed blocks into the pipeline – something Google actually talked about in their presentation of the SoC.
The imaging system uses IP blocks that correspond to an Exynos imaging pipeline, such as pixel phase detection processing units, contrast autofocus processing units, image scalers, distortion correction processing blocks and view-dependent occlusion texture function processing blocks. What’s lacking here is that some other processing blocks are missing, which I imagine are related to more post-processing computation blocks that Samsung uses.
The Google developed IP blocks in the ISP chain seem to be their own 3AA IP (Auto-Exposure, Auto-White Balance, Auto-Focus), as well as a custom pair of temporal noise-reduction IP blocks that are able to align and merge images. These are likely the custom blocks that Google was talking about when saying that they’ve developed blocks which help accelerate the kind of image processing that they employ as part of the Pixel lineup’s computational photography, and inarguably represent very important parts of the image processing pipeline.
Google's edgeTPU - What Makes the Tensor a Tensor
By now, it’s been quite clear that the big central talking point of the Google Tensor has been its TPU – or its Tensor Processing Unit. The TPU is, as its name implies, a custom Google developed-IP block that the company has been working on for a few years now. Until now, Google just called it the TPU inside the Tensor SoC, but at the driver level the company calls the block their “edgeTPU”. This is quite interesting as signals that the block is related to the ASIC “Edge TPU” that Google had announced back in 2018. The discrete chip had been advertised at 4 TOPs of processing power in 2 Watts of power, and while Google doesn’t advertise any performance metrics on the TPU inside the Tensor, there are entries showcasing the block goes up to 5W of power. So if the two are indeed related, then given the significant process node advantages and overall much newer IP, the performance figures of the Tensor TPU (sic) should be extremely significant.
The block is very much the pride of Google’s silicon team, telling us that it’s using the latest architecture for ML processing that’s been optimized for the way Google’s R&D teams run machine learning within the company, and promises to allow for opening up the kind of new and unique use-cases that were the main goal for making a custom SoC in the first place. We’ll go into the product-side use-cases in a more Pixel focused review later on, but the performance metrics of the TPU do appear to be impressive.
The TPU block also seems to come with some sort of block that Google calls “GSA”. This is just speculation on my part here based on the drivers, but this seems to be some sort of control block that is in charge of operating the TPU firmware, and I think contains a quad-core Cortex-A32 CPU setup.
Media Encoders, Other Stuff
On the media encoder side, the Tensor SoC uses both Samsung’s own Multi-Function Codec IP block (which is identical to what’s used on the Exynos series) as well as what appears to be a Google IP block that is dedicated to AV1 decoding. Now this is a bit weird, as Samsung does advertise the Exynos 2100 as having AV1 decode abilities, and that functionality does seem to be there in the kernel drivers. However on the Galaxy S21 series this functionality was never implemented on the Android framework level. I have no good explanation here as to why – maybe the IP isn’t working correctly with AV1.
The Google IP block, which the company calls “BigOcean”, is a dedicated AV1 decoder, and this does actually expose AV1 decoding ability to the Android framework. The very weird thing here is that all it does is AV1 – every other encoding and decoding of other formats is left over to the Samsung MFC. It’s an interesting situation and I’m left to wonder where things evolve in the next-gen SoC.
Other differences for the Tensor SoC are for example the audio subsystem. Samsung’s SoC low-power audio decoding subsystem is thrown out in favor of Google’s own block design, I didn’t dwell too much into it but generally both blocks have the same task of allowing low-power audio playback without needing to wake up large parts of the SoC. I think this block (or the GSA) is also responsible as the always-on context-hub for sensor data aggregation, with the Tensor here using Google’s IP and way of doing things versus the Exynos variant of the same block.
Google also employs a fixed function hardware memory compressor in the form of a block called Emerald Hill, which provides LZ77 compression acceleration for memory pages, and can in turn be used to accelerate ZRAM offloading in swap. I’m not sure if the Pixels are currently running this out of the box, but should be able to be confirmed by seeing “lz77eh” in /sys/block/zram0/comp_algorithm , if somebody is able to read that out. As an anecdote, as far back as 5 years ago Samsung integrated similar hardware compression IP blocks into their SoCs for the very same task, but for some reason those were never enabled for shipping devices. Maybe the energy efficiency didn’t pan out as they thought it would.
External Exynos Modem - First non-Qualcomm mmWave Phones?
Since it’s a phone SoC, naturally the Tensor needs some sort of cellular connectivity. This is another area where Google is relying on Samsung, using the company’s Exynos Modem 5123. But, unlike the Exynos 2100 and its integrated modem, the Tensor uses a discrete external variant. As to why it’s discrete, it’s likely that with the massive GPU, larger CPU setup (two X1’s with full 1MB L2’s), and unknown size of the TPU, that the Tensor chip is quite large even in relation to the Exynos 2100.
Another theory on my side is that Google would somehow still be tied to Qualcomm for US networks – either for CDMA or mmWave 5G connectivity. Surprisingly, it seems this isn’t the case, as the Pixel 6 series ships with the Exynos modem across the globe. That makes the Pixel 6 family particularly interesting, as it seems that this is the first non-Qualcomm mmWave implementation out there. For reference, Samsung had talked about their mmWave RFICs and antenna modules back in 2019, saying there were plans for 2020 devices. Whether that meant designs starting in 2020 (which the Pixel 6 series would be) or commercial availability wasn’t clear at the time, but it seems that these are the first commercial phones with the solution. I don’t expect to have mmWave coverage here for myself for another few years, but third-party reports showcase the phone reaching up to 3200Mbps while other field-tests showing around half of the practical speeds of Qualcomm devices. I hope more people in the next weeks and months will have the opportunity to dive deeper into the modem’s performance characteristics.
Semi-Custom Seems Apt
Overall, the Google Tensor ends up being almost exactly what we expected the chip to be, from the earliest reports of a collaboration between Google and Samsung. Is it a Google chip? Yes, they designed it in the sense that they defined it, while also creating quite a few Google-unique blocks that are integral to the chip's differentiation. Is it a Samsung Exynos chip? Also yes, from a more foundational SoC architecture level, the Tensor has a great deal in common with Samsung’s Exynos designs. In several areas of the Tensor there are architectural and behavioral elements that are unique to Samsung designs, and aren’t found anywhere else. To that end, calling the Google Tensor a semi-custom design seems perfectly apt for what it is. That being, said, let’s see how the Tensor behaves – and where it lands in terms of performance and efficiency.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Alistair - Wednesday, November 3, 2021 - linkIt's the opposite, the iPhone is massively ahead in performance, but every high end phone takes the same high end photos... you got the same photos but a lot less performance...
aclos3 - Saturday, November 6, 2021 - linkI took some time to really test the camera and you are simply wrong. I have been photographing with it heavily for the last couple of days and the camera is incredible. Call it a gimmick or whatever, but the way they do their photo stacking puts this phone in a league of its own. If your main use case for a phone is benchmarking, I guess this is not your device.
Lavkesh - Thursday, November 11, 2021 - linkEveryone and their grand mother do image stacking. iPhone is almost as good if not better even with a smaller sensor when compared to the latest Pixel. How's that for "in a league of its own"?
Amandtec - Wednesday, November 3, 2021 - linkI don't doubt the veracity of your comment but I find the hostile undertone somewhat curious.
damianrobertjones - Wednesday, November 3, 2021 - linkBut... but... they said that it's amazing!! Who do I believe? /s
Zoolook - Saturday, November 6, 2021 - linkAs long as they use Samsung process they will be hopelessly behind Apples Socs in efficiency unfortunately, would be interesting to see SD back on TSMC process for a direct comparison with Apple silicon.
Tigran - Tuesday, November 2, 2021 - linkPerformance looks very disappointing. Google promised 4.7x GPU performance improvement vs Pixel 5.
singular9 - Tuesday, November 2, 2021 - linkI was enjoying how the speculation about the GS101 were claiming its "not far behind" the SD888. I was never expecting google to make another high end device, let alone one that undercuts most of the competition, as its just not what trends would say.
I am not impressed. As someone who was rather hopeful that google would take control and bring us android users a true apple chip equivalent some day, this is definitely not the case with google silicon.
Considering how cookie cutter this design is, and how google made some major amateur decisions, I do not see google breaking away from the typical android SOC mold next generation.
Looking back at how long it took apple to design a near 100% solo design for the iPhone (A8X was the first A chip to use a complete inhouse GPU and etc design, other than ARM cores), that is a whopping 4 and a half years. Suppose this first google "designed" chip is following the same trend, an initial "brand name" break away yet still using a lot of help from other designs, and then slowly fixing one part at a time till its all fixed, while also improving what is already good, I could see google getting there by the Pixel X (10?). But as it stands, unless google dedicates a lot of time to actually altering Arm's own designs and simply having samsung make it, I don't see Tensor every surpassing qualcomm (unless samsung has some big breakthrough in their own CPU/GPU IP which may or may not come with AMD's help).
As the chip stands today, its "passable", but not impressive. Considering Google can get android to run really well on a SD765G, this isn't at all surprising. The TPU seems like a nice touch, since honestly, focusing on voice is more important than on "raw" cpu performance or something. I have always been frustrated with speech to text not being "perfect" and constantly having to correct it manually and "working around" its limitations. As for my own experience with the 6 Pro, its bloody good.
Now to specifics.
The X1 chips do get hot, as does the 5G modem. I switched the device to LTE for now. I do get 5G at home and pretty much most places I go, and it is fast, its not something I need right now. I even had a call drop over 5G because I walked around a buildings corner. Not fun.
The A76 excuse I have heard floating around, is that it takes up less physical die space, by A LOT. And apparently, there was simply no room for an A77 or A78 because the TPU and GPU took up so much room. I don't understand this compromise, when the GPU performance is this mediocre. Why not simply use the same GPU size as the S21 (Ex2100) and give the A78's more room? Don't know, but an odd choice for sure.
The A55 efficiency issues are noticeable. Try playing spotify over bluetooth for an hour, and watch the battery drain. I get consistently great standby time, and very good battery life when heavily using my device, but its these background screen off tasks that really chug the battery more than expected.
Over all though I haven't noticed any serious issues with my unit. The finger print scanner works as intended, and is better than my 8T. The camera does just as well if not better than the previous pixels. And over all...no complaints. But I wonder how much of this UX comes from google literally brute forcing their way with 2 X1 cores and a overkill GPU, and how much of it is them actually trying.
As for recommendations to google for Tensor V2, they need to not compromise efficiency for performance. This phone isn't designed to game, cut the GPU down, or heck, partner with AMD (who is working with samsung) to bring competitive graphics to mobile to compete with Adreno from QComm. 2 X1 cores, if necessary, can stay, but at that point, might as well just have 4 of them and get rid of all the other cores entirely and simply build a very good kernel to modulate the frequency. Or make it a 2+6 design with A57 cores. As someone who codes kernels for pixels and nexus devices for a long time, trying to optimize the software to really get efficiency out of the big.LITTLE system is near impossible, and in my opinion, worthless unless your entire scheduler is "screen on/off" based, which is literally BS. I doubt google has any idea how to build a good CPU governor nor scheduler to truly make this X+X+X system even work properly, since I have yet see qcomm or samsung do it "well" to call commendable.
The rest of the phone is fine. YoY improvements are always welcome, but I think the pixel 6/pro just really show how current mobile chips are so far behind apple that you might as well give up. YoY improvements have imo halted, and honestly no one seems to be having the thought that maybe we should cut power consumption in half WITHOUT increasing performance. I mean...the phones are fast enough.
Who knows. We will see next year.
PS: I also am curious what google will do with the Pixel 6A (if they make one at all). Will it use a cut down GS101 or will it get the whole chip? It would seem overkill to shove this into a 399$ phone. Wonder what cut downs will be made, or if there will be improvements as well.
sharath.naik - Tuesday, November 2, 2021 - linkGood thoughts, there is one big issue you missed. Pixel camera sensors 50mp/48mp being binned to 12mp yet Google labeled them as 50mp/48mp. Every shot outside the native 1x,4x is just a crop of the 12mp image including pottaitk3mp crop) and 10x(2.5mp crop}.
teldar - Thursday, November 4, 2021 - linkYou are absolutely a clueless troll and should go back to your cave. Your stupidity is unwanted.