(If you're interested, please give Part 2 a read, although a lot of that is actually outdated so it's more for the curious if anything.)
After a long hiatus, it feels time to move on to continuing this series of thinking out what the 10th-gen systems might bring.
Before that though, I'll briefly share some thoughts on mid-gen refreshes, which I posted on B3D Sunday. I did actually post some other mid-gen refresh stuff here (at least for PS5 Pro) a while ago, but in light of new information and discussions I've revised pretty much all of that. This isn't so much a hard detailing on possible mid-gen specs so much as it is what products could comprise of such refreshes, and expanding the term into including peripherals. I've also taken into consideration the business strategies and trajectories Sony and Microsoft seemingly seem to be going towards:
>PS5 Slim: 5nm, ~140 watt system TDP (30% savings on 5nm, better PPW GDDR6 chips, possibly smaller array of 3x 4-channel NAND modules 384 GB capacity each, chip-packaging changes and chiplet setup ,etc). RDNA 4-based (16-month intervals between RDNA gens would mean Jan. 2022 for RDNA 3, July 2023 for RDNA 4), 1 TB SSD storage, same SSD I/O throughput (with possibly slightly better compression due to API maturity and algorithms), same amount of GDDR-based memory and bandwidth (so, sticking with GDDR6), $299 (Digital only). November 2023 release.
>PS5 Enhanced: 5nm, ~150 watt system TDP (factoring in disc drive), RDNA 4-based, 6x 384 GB NAND modules (~2 TB SSD), same GDDR6 memory capacity but faster chips (16 Gbps vs. 14 Gbps) for 512 GB/s bandwidth, improved SSD I/O bandwidth (~8 GB/s Raw, up to 34 GB/s maximum 4.25:1 compression ratio), slightly better GPU performance (up to 11.81 TF due to 5nm; this would probably increase total system TDP to about 155 watts), Zen 2-based CPU, disc drive, $399. November 2023 release.
>PS5G (Fold): 5nm, ~25 watt - 35 watt system TDP, RDNA 4-based (18 CU chiplet block), 8 GB GDDR6 (8x 1 GB 14 Gbps chips downclocked to 10 Gbps,3D-stacked PoP (Package-On-Package), 320 GB/s bandwidth), 256 GB SSD storage (2x 2-channel 128 GB NAND modules), 916.6 MB/s SSD I/O bandwidth (compressed bandwidth up to 3.895 GB/s), Zen 2-based CPU, 7" OLED screen, streaming-orientated for PS5 and PS4 Pro titles (native play of PS4 games), $299 (Digital only). November 2023 release
>PSVR2: Wireless connectivity with PS5 systems, backwards-compatible with PS4 (may required wired connection), on-board processing hardware for task offloading from base PS5, Zen 2-based CPU, 4 GB GDDR6 as 4x 1 GB modules in 3D-stacked PoP setup (14 Gbps chips downclocked to 10 Gbps, 160 GB/s bandwidth), 128 GB onboard SSD storage (1x 2-channel 128 GB NAND module, 458.3 MB/s raw bandwidth, up to 1.9479 GB/s compressed bandwidth), AMOLED lenses, $399, November 2022 release.
>SERIES S Lite: 5nm, RDNA 3-based (possibly with some RDNA 4 features mixed in), possibly some CDNA 2-based features mixed in, 10 GB GDDR6, 280 GB/s bandwidth (224 GB/s for GPU, 56 GB/s for CPU/audio), 1 TB SSD, same raw SSD I/O bandwidth (2.4 GB/s) but increased compression (3.5:1 ratio, up to 8.4 GB/s maximum compression ratio), $199 (Digital only), November 2022 release
>SERIES X-2: 5nm EUV, RDNA 4-based, some CDNA 2-based features mixed in, 20 GB GDDR6 (10x 2 GB chips), 16 Gbps modules (640 GB/s bandwidth), improved SSD I/O bandwidth (~8 GB/s, 3.5:1 ratio compression, up to 28 GB/s maximum compression ratio), lower system TDP (~160 watts - 170 watts), 2 TB SSD storage, Zen 2-based CPU, disc drive, improved GPU performance (~14 TF), $449. November 2023 release.
>SERIES.AIR (Xcloud streaming box, think Apple TV-esque): 5nm, RDNA 3-based), 8 GB GDDR6 (4x 2 GB chips), 14 Gbps modules downclocked to 10 Gbps (160 GB/s bandwidth), 256 GB SSD, same SSD I/O as base Series S and Series X (2.4 GB/s) but improved compression bandwidth (up to 8.4 GB/s maximum compression ratio), $99 (Digital Only), November 2021 release
>SERIES.VIEW (Wireless display module screen that can be added to Series S Lite and Series.Air (to lesser extend Series X-2) for a makeshift portable device, or used as AR extension of VR): Zen 2-based CPU (4-core variant, lower clocks), 2 GB GDDR6 as 2x 1 GB modules (14 Gbps chips downclocked to 8 Gbps, 64 GB/s bandwidth), 8" OLED display, USB-C port (included Male/Male USB-C double-point module can be used to wire Series.View with Series S Lite), $199, Spring/early Summer 2022 release. Also compatible with PC.
>SERIES.VIRTUA (VR helmet developed in tandem with Samsung, for Series system devices as well as PC): Based on Samsung HMD Odyssey + headset but with some paired-down specs for more mid-range performance capabilities. $399, Spring/Summer 2022 release.
So that's what I'm thinking Sony and Microsoft do insofar as mid-gen refreshes and major peripheral upgrades, up to early 2024. From that point on it's really up in the air, probably easiest to see the two of them doing bundles for various mixes of these refreshes and peripherals. For example, Sony could probably do a package bundle in late 2021 and early 2022 with PS5 (base) and PSVR to drive out remaining stock for the first generation of PSVR and the original PS5 models, making way for the PS5 system refreshes and PSVR refresh in 2022 (PSVR2) and PS5 Slim & Enhanced (2023).
Meanwhile, I think Microsoft will try SKU bundles like Series.Air & Series.View around late 2023 and into 2024, or even later SKU bundles like Series X-2 & Series.Virtua in late 2024 into early 2025. I think that's what Sony & Microsoft will do going into the tail-end of 9th gen and leading into 10th-gen...
That basically sums up my mid-gen refresh speculation; from herein I'll focus on 10th-gen hardware, and like I said above, starting with the PlayStation 6 here and breaking it down into parts. The first part will pretty much be completely about the GPU, and I try giving some explanation to certain decisions below. I've settled on these guesses after rewriting possible specifications over a dozen times, changing MANY things along the way.
These are, after all, just my own guesses/speculation but I tried being as realistic and technical to market realities, trends, and technological developments (plus likely business strategies) as possible. So let's just jump right in there...
Dunno, these are just some designs I was able to find. Anyone got links to some better PS6 render concepts?
>YEAR: 2026 or 2027.
>2026 likely, but 2027 more likely. Would say 45/55 split between the two.
>Gives PS5 hardware and software more time to "bake" an ecosystem market without contending with PS6 messaging/marketing
>Allows for cheaper securement of wafer production, memory (volatile, NAND) vs. an earlier launch
>Gives 1P studios more time to polish games intended for launch of PS6
>Sony wants to shorten 1P dev times not to bring out hardware faster (returning console gen length to 5 years), but to release more 1P titles in a given (by modern notion) standard console cycle (6-7 years). Allows them to drive more profits in a 6-7 year period, which helps offset R&D/production costs of 10th-gen hardware provided R&D/production costs stay roughly similar to what they were for 9th-gen (PS5), or only 25% - 30% increase at most.
>Only way for them to get the performance they need at a reasonable power budget
>Will compliment contemporary RDNA architecture designs/advancements very well
>Can have wafer costs managed through scaled offsetting of budget in other areas (die size, memory, etc.)
>ARCHITECTURE: RDNA 7-based
>Assuming 15-month intervals between RDNA refreshes, RDNA 7 would be completed by February 2027. RDNA 8 would be completed and released by May 2028. A PS6 in either 2026 or 2027 could be predominantly RDNA 7-based, with some bits maybe from RDNA 8 (or influencing RDNA 8) if the release of PS6 is 2027 rather than 2026.
>SHADER ARRAYS: 2
>SHADER ENGINES (PER SA): 2
>72 CUs would double PS5, but also at least double the silicon budget, AND would be on 3nm EUVL (+), which would be more expensive than 7nm in its own right. Only way to offset that would be to either gimp in some other area (storage, memory, CPU etc.) or going with 5nm EUVL which curbs some of the performance capability due to having less room on the power consumption budget.
>CUs will only get bigger with more silicon packed into them. PS5 CUs are 62% larger than PS4 CUs for example, despite being on a smaller node, aka more features are built into the individual CUs relatively speaking (such as RT cores). Any features that scale better with integration in the CU will be able to bump up the CU size compared to PS5, even if the overall CU count remains the same or only slightly larger.
>PS6 CUs could be between 50% - 60% larger than PS5 CUs
>Chiplet design can allow for more active CUs without need to disable out of yield concerns
>Would allow for similar GPU programming approaches in line with PS5
>Theoretically easier to saturate with work
>SHADER CORES (PER CU): 128
>SHADER CORES (TOTAL): 5,120
>Going with a smaller GPU (40 CUs) would require something else to be increased in order to provide suitable performance gains. Doubling the amount of Shader Cores per CU is one of the ways to do this, though 128 could be closer to a default for later RDNA designs by this point.
>ROPs: 128 (4x 32-unit RBs)
>Doubling of ROPs on the GPU in order to compliment the increase in per-CU shader cores
>TMUs (per CU): 8
>Assuming a 16:1 ratio between SCs and TMUs per CU is maintained, doubling the SCs from 64 to 128 would also 2x the TMUs from 4 to 8
>TMUs (TOTAL): 320
>MAXIMUM WORKLOAD THREADS: 40,960 (32 SIMD32 waves * 32 threads * 40 CUs)
>MAXIMUM GPU CLOCK: 3362.236 MHz
>PRIMITIVES (TRIANGLES) PER CLOCK (IN/OUT): Up to 8 PPC IN, up to 6 PPC OUT (current RDNA supports up to 4 PPC OUT)
>PRIMITIVES (TRIANGLES) PER SECOND (IN/OUT): Up to 26.8978 billion PPS IN, up to 20.17335 billion PPS OUT
>GIGAPIXELS PER SECOND: 430.366208 G/pixels per second
>INSTRUCTIONS PER CLOCK: 2 IPC
>INSTRUCTIONS PER SECOND: 6.724472 billion IPS
>RAY INTERSECTIONS PER SECOND: 1075915 G/rays per second (1.075915 T/rays per second) (3362.236 MHz * 40 CUs * 8 TMUs)
* RT intersection calculations might be off; figured RT calculations leverage the TMUs in each CU but wasn't sure if that's 100% the case.
>THEORETICAL FLOATING POINT OPERATIONS PER SECOND: 34.4 TF (40 CUs * 128 SCs * 2 IPC * 3362.236 MHz)
>L0$: 256 KB (per CU), 10.24 MB (total)
>L1$: 1 MB (per Dual CU), 20 MB (total)
>L2$: 24 MB
>L3$: 192 MB (Infinity Cache)
*SRAM bit density is 0.027 microns per bit on 7nm, meaning 128 MB would be ~ 166 mm^2 on 7nm/7nm DUV. 87% density reduction on 3nm EUV would reduce this to about 22mm^2. SRAM cell density of 1.5x on the node could bring this to 192 MB.
>TOTAL: 246.24 MB
>TDP: 160 watts
>Die Area: ~100 mm^2 - 120 mm^2 (factoring in larger CUs, additional integrated silicon, larger caches, revamped frontends and backends, etc.)
>There is an opportunity with future AMD hardware to figure a way for relatively wider GPUs to dynamically scale down saturation workloads to smaller cluster of CUs while proportionately increasing the frequency of clocks on those active hardware components while the inactive hardware components/CUs reserve at a dramatically lower clock (sub-100 MHz) until they are needed for more work.
>This assumes that AMD can continue to scale GPU clock frequencies higher (4 GHz - 5 GHz) with future RDNA designs, provided they can make such work with silicon designs on smaller node processes. Since any given cluster of the GPU would need to be able to clock this high, it means the entire GPU design must be able to clock at this range, potentially across the entire chip, in order to make this feasible.
>Power delivery designs may also have to be reworked; chiplet approach will help a lot here.
>This approach would be more suitable for products that need to squeeze out and scale performance for various workloads, support variable frequency (this is, essentially, variable frequency within portions of the GPU itself), and has to stay within a fixed power budget...such as a games console. Therefore it might be less required (though potentially beneficial) for PC GPUs as it gives a different means of scaling clocks with workloads while having more granularity in control of the GPU's power consumption.
>AMD's implementation would be based on Shader Array counts, so the loads would be adjusted per Shader Array. On chiplet-based designs, each chiplet would theoretically be its own Shader Array, so this is essentially a way of scaling power delivery between the multiple chiplets dynamically.
>This could be used in tandem with already-established power budget sharing between the CPU and GPU seen in designs like PS5; in this case it would be beneficial in allowing the GPU to maintain implementation of this particular feature for games that may have lighter volume workloads, but intense iteration workloads that could stress a given peak frequency. However, this should be minimal and its fuller use would be more in the traditional fashion when talking about full GPU volume workloads.
>Another benefit of State Mode is that when targeting power delivery to a smaller cluster of the GPU hardware and increasing the clock, clock-bound processes (pixel fillrate, instructions per second, primitives per second) see large gains, generally inverse of the decrease in active CU count. However, some other things such as L0$ and L1$ amounts will reduce, even if actual bandwidths have better-than-linear scaling respective of the total active silicon.
[PS6 - STATE MODE IMPLEMENTATION]
>SHADER ARRAYS: 1
>SHADER ENGINES (PER SA): 2
>SHADER CORES (PER CU): 128
>SHADER CORES (TOTAL): 2,560
>Future RDNA chiplet designs will probably keep the back-end to its own block. However, for design reasons ROP allocation would likely scale to per chiplet cluster evenly, so each chiplet (or if essentially a chiplet, SE) would have its own assigned group of ROPs. This equals 2x 64 ROPs for PS6.
>TMUs (PER CU): 8
>TMUs (TOTAL): 160
>MAXIMUM WORKLOAD THREADS: 20,480
>MAXIMUM GPU CLOCK: 4113.449 MHz (shaved off some clock from earlier calcs to account for non-linear clock scaling with power scaling)
>PRIMITIVES (TRIANGLES) PER CLOCK (IN/OUT): Up to 8 PPC (IN), up to 6 PPC (OUT)
>PRIMITIVES PER SECOND (IN/OUT): Up to 32.9 billion PPS (IN), up to 24.675 billion PPS (OUT)
>GIGAPIXELS PER SECOND: Up to 263.26 G/pixels per second (4113.449 MHz * 64 ROPs)
>INSTRUCTIONS PER CLOCK: 2
>INSTRUCTIONS PER SECOND: 8.226898 billion IPC
>RAY INTERSECTIONS PER SECOND: 658.151 G/rays per second (4113.449 MHz * 20 CUs * 8 TMUs)
>THEORETICAL FLOATING POINT OPERATIONS PER SECOND: 21.06 TF
>L0$: 256 KB (per CU), 5.12 MB (total)
>L1$: 1 MB (per Dual CU), 10 MB (total)
>L2$: 24 MB
**Unified cache shared with both chiplets
>L3$: 192 MB
**Unified cache shared with both chiplets
>>TOTAL: 231.12 MB
(for some dumb reason I can't outdent this section. Oh well)
That should be everything for a hypothetical PS6 GPU; small things like codec support, display output support etc. wouldn't really be that hard or crazy to take a crack at, and I'm not particularly interested in that. However, I AM interested in getting to the CPU, audio, memory, storage etc. and also to see what some of you have in terms of ideas for a PS6 GPU design, hypothetically speaking.
Sound off below if you'd like and no, it's never too early to start thinking about next-gen. You think Mark Cerny and Jason Ronald aren't already brainstorming what the next round of hardware could bring? I bet you they are ...