The two cards, due for release on December 13, will be the first products to be released using the RDNA 3 architecture. According to AMD, the new flagship 7900 XTX will offer up to 70% more 4K performance than the previous flagship of, the 6950 XT. This performance boost comes thanks to several architectural improvements to RDNA that cumulatively deliver 54% higher performance per watt than RDNA 2, as well as higher clock speeds thanks to TSMC’s 5nm (and 6nm) processes and higher overall power consumption. The complete RX 7900 XTX will hit the streets at $999. Meanwhile, the second-tier RX 7900 XT will cost $899. Specs Comparison AMD Radeon RX 7000 Series AMD Radeon RX 7900 XTX AMD Radeon RX 7900 XT AMD Radeon RX 6950 XT AMD Radeon RX 6900 XT Stream Processors 12288 (9101CU20 CUs) (84 CUs50 CUs) ? ? 128 128 Game Clock 2.3 GHz 2.0 GHz 2100 MHz 2015 MHz Boost Clock ~2.5 GHz ? 2310MHz 2250MHz (FP32) 56.5 TFLOPS 43 TFLOPS 21.5 TFLOPS 20.6 TFLOPS MOMERY 20 GBPS GDDR6 20 GBPS GDDR6 18 GBPS GDDR6 16 GBPS GDDR6 Memory Bus Width 384-bit 320-bit 256-bit VRAM 24GB 16GB 16GB 16GB ram 256-bit vRAM 24GB 16GB 16GB 166-bit 256-bit 256-bit vram 24gb 16gb 16gb 96MB 80MB 128MB 128MB Total power Board 355W 300W 335W 300W Manufacturing process GCD: TSMC 5NM MCD: TSMC 6NM GCD: TSMC 5NM MCD: TSMC ARNM TNM TNM TNM TNM 777 58B 58B – (1 MCD) 26.8B 26.8B ArchItor RDN3 RD (1 MCD) 26.8B 26. RDNA2 GPU Big Navi 3x Big Navi 3x Navi 21 Navi 21 Release Date 13/12/2022 13/12/2022 05/10/2022 2022 12/08/2020 Release Price $999 $899 $1099 99 AMD’s long-awaited update to its GPU architecture comes as the company has been firing on all cylinders in recent years. On the CPU side, the Zen 3 and Zen 4 architectures in particular have proven to be very efficient and, meanwhile, AMD has managed to bounce back from its graphics slump with its RDNA family of GPU architectures. RDNA 2, the base of the Radeon RX 6000 series, exceeded expectations and proved to be a very strong competitor, and now AMD looks set to exceed expectations once again, with RDNA 3’s 54% efficiency per watt comes ahead of AMD’s first promises of 50% profit.
AMD Goes Chiplets for GPUs
While today’s reveal from AMD was a more cautious event than the Ryzen 7000 reveal a few months ago, AMD has still given us quite a few details about the RDNA 3 architecture and the cards – more than we have time to cover here – so let’s start at the top, building the first RDNA 3 GPU. The Navi 3x GPU (AMD is not confirming the specific GPU name at this time) breaks new ground for AMD not only in terms of performance, but also in terms of its construction. For the first time from any of the big 3 GPU manufacturers, AMD is using chiplets in GPU manufacturing.
Chipsets are in some ways the holy grail of GPU manufacturing because they give GPU designers options to break complex monolithic GPU designs into many smaller parts – allowing new options for scaling, as well as mixing and matching the process node used in the build. That said, it’s also a holy grail because the sheer amount of data that needs to be transferred between different parts of a GPU (on the order of terabytes per second) is very hard to do – and very necessary to do if you want a multi-chip GPU to to be able to present itself as a single device.
For the large Navi 3x chip, AMD has assembled two types of chiplets, essentially separating the memory functions from a classic GPU into its own chiplets. This means that the core functions of the GPU are housed in what AMD calls the Graphics Compute Die (GCD), which houses all the ALU/compute hardware, graphics hardware, as well as auxiliary blocks such as the display and media engines. Because the GCD houses the performance-critical aspects of the overall GPU, it is based on TSMC’s 5nm process. This gives AMD the best density, power consumption and clock speeds for these components, albeit obviously at a higher manufacturing cost. The GCD die size is 300mm2.
Meanwhile, the new Memory Cache Die (MCD) houses AMD’s infinity cache (L3 cache), as well as 64-bit (technically 2×32-bit) GDDR6 memory controllers. MCD is one of the scalable aspects of chiplet design, as Big Navi 3x GPU SKUs can be configured by splitting them with more or fewer MCDs. A full configuration in this case is 6 active MCDs, which is what we see in the 7900 XTX. Meanwhile, the 7900 XT will have 5 active MCDs, with a 6th faulty/spacer MCD present for salvage purposes and physical package stability. A single MCD has a die size of 37 mm2 and is built on TSMC’s 6 nm process. This is an example of AMD’s process node flexibility, placing the less critical GDDR6 memory controllers and Infinity Cache on a cheaper process node. GDDR6 controllers are one of those classic examples of technology that doesn’t scale very well with smaller process geometries (like most I/O formats), so it’s easy to see why AMD would want to avoid building at 5nm for minimal benefits .
In its full 6 MCD (7900 XTX) configuration, the Big Navi 3x offers a 384-bit GDDR6 memory bus, along with 96 MB of L3 cache. Meanwhile, a 5 MCD (7900 XT) offers a 320-bit GDDR6 memory bus and 80MB of L3 cache. For the purposes of today’s announcement, AMD hasn’t delved into how they managed to get a chiplet-based GPU to work, but they did confirm a few important details. First and foremost, to deliver the bandwidth needed to have the memory subsystem located off-chip, AMD uses Elevated Fanout Bridge (EFB) packaging technology, which AMD first used for its Instinct MI200 series accelerators (CDNA2). In these parts of the accelerator it was used to connect the monolithic GPUs together, as well as the HBM2e memory. In RDNA 3, it is used to connect MCDs to GCD.
Specifically, the Elevated Fanout Bridge is a non-organic packaging technology, meaning it is complex. The fact that AMD is able to get 5.3 TB/s of die-to-die bandwidth through it underlines its usefulness, but it also means that AMD is undoubtedly paying a lot more to pack in the Big Navi 3x than what on Navi 21 (or Ryzen 7000). Internally, AMD calls this memory-graphics link Infinity Link. Which, as the name suggests, is responsible for the (transparent) routing of AMD’s Infinity Fabric between dies. As mentioned earlier, the cumulative bandwidth here between the MCDs and the GCD is 5.3 TB/s. It’s unclear if the limiting factor is the bandwidth of the Infinity Link, or if the combined Infinity Cache + GDDR6 memory controllers can’t move enough data to fully saturate the link. But regardless, it means there’s effectively a paltry 900GB/s of bandwidth between a single MCD and GCD – more than all the combined off-memory bandwidth of the last-gen Radeon RX 6950 XT (and 2.7 times more than the Navi 21 on-die bandwidth). While we’re on the subject of AMD’s L3 Infinity Cache, it’s worth noting here that it’s actually slightly smaller on the Big Navi 3x than on the Navi 21, with a maximum capacity of 96MB versus 128MB on the former. According to AMD, they have made further improvements to improve data reuse in the Infinity Cache to compensate for this drop in capacity. At this point it is unclear if the change is a function of software algorithms or if they have made more fundamental hardware changes. Finally, while AMD lists die sizes for GCDs and MCDs, it does not list individual transistor counts. So while we know that a full 6 MCD Big Navi 3x configuration consists of 58 billion transistors (2.16 times more than Navi 21), we don’t know how much of that is GCD versus MCD.
AMD RDNA 3 Compute & Graphics Architecture: Bringing Back ILP & Improving RT
Going down a level, let’s take a look at the actual graphics and computer architecture that support RDNA 3 and Big Navi 3x.
While it still clearly shares many of the core design elements of AMD’s overall RDNA architecture, RDNA 3 is in some ways a much bigger change in architectural design than RDNA 2. While RDNA 2’s compute core has remained essentially unchanged from the RDNA core (1), RDNA 3 makes some big changes. The biggest impact is how AMD organizes its ALUs. In short, AMD has doubled the number of ALUs (stream processors) in a CU, increasing from 64 ALUs in a Dual Compute Unit to 128 within the same unit. AMD achieves this not by doubling the Dual Compute Units, but by giving the Dual Compute Units the ability to issue dual-issue instructions. In short, each SIMD strip can now execute up to two instructions per cycle.
But, as with all dual-issue configurations, there is a trade-off. SIMDs can issue a second instruction only when AMD hardware and software can extract a second instruction from the current wavefront. This means that RDNA 3 now relies explicitly on deriving Instruction Level Parallelism (ILP) from wavefronts in order to achieve maximum utilization. If the…