• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Samsung has managed to maks GDDR6 memory faster than GDDR6X

ethomaz

Banned
That great and means GDDRx will reach even higher speeds.

In the opposite side nobody talks anymore about HBM in consumer space.
 
thats awesome, good job samsung! I think XsX Pro and PS5Pro will need close to 1 TB/sec of bandwidth for their RAM. RAM bandwidth is also important for content creation.
:messenger_clapping:
 

ACESHIGH

Banned
I wonder if we will get to a time where we will have a single pool of fast memory on PC for system and graphics like in consoles
 

Loxus

Member
That great and means GDDRx will reach even higher speeds.

In the opposite side nobody talks anymore about HBM in consumer space.
I don't know why either.
HBM 3 sounds good with RDNA 3 because chiplets already have an interposer.

HBM3: Big Impact On Chip Design
Chipmakers have made it clear that HBM3 makes sense when there is an interposer in the system, such as a chiplet-based design that already was using the silicon interposer for that reason.

Not to mention 32GB per stack.
Rambus Outs HBM3 Details: 1.075 TBps of Bandwidth, 16 Channels, 16-Hi Stacks
The increased number of memory channels supports more memory die, thus supporting up to 16-Hi stacks (supports up to 32 Gb per channel) that deliver up to 32GB of total capacity, with 64GB of capacity possible in the future.

Either way, with 3D stacking becoming the future, HBM going to become the standard sooner rather than later.
Maybe with the PS6/XBSX2.
 

ethomaz

Banned
I don't know why either.
HBM 3 sounds good with RDNA 3 because chiplets already have an interposer.

HBM3: Big Impact On Chip Design
Chipmakers have made it clear that HBM3 makes sense when there is an interposer in the system, such as a chiplet-based design that already was using the silicon interposer for that reason.

Not to mention 32GB per stack.
Rambus Outs HBM3 Details: 1.075 TBps of Bandwidth, 16 Channels, 16-Hi Stacks
The increased number of memory channels supports more memory die, thus supporting up to 16-Hi stacks (supports up to 32 Gb per channel) that deliver up to 32GB of total capacity, with 64GB of capacity possible in the future.

Either way, with 3D stacking becoming the future, HBM going to become the standard sooner rather than later.
Maybe with the PS6/XBSX2.
Maybe costs? I don’t know either.
But AMD Infinite Cache basically kills the need to higher bandwidth than what GDDR6 delivery.
 
Maybe costs? I don’t know either.
But AMD Infinite Cache basically kills the need to higher bandwidth than what GDDR6 delivery.

Not for future GPU multiple times as fast as the current top end.

There's only so much SRAM you can fit on a die and as we go lower than 5nm costs will balloon further, meaning even less incentive to have large on-die caches.

IC is effectively a stop-gap solution, necessitated because memory bandwidth hasn't scaled nearly as well as transistor count on the processor cores. Once HBM becomes low enough cost in relative terms, IC will be quickly discarded for the TB/s worth of bandwidth and multiple tens of GB memory capacity that HBM can offer.
 

ethomaz

Banned
Not for future GPU multiple times as fast as the current top end.

There's only so much SRAM you can fit on a die and as we go lower than 5nm costs will balloon further, meaning even less incentive to have large on-die caches.

IC is effectively a stop-gap solution, necessitated because memory bandwidth hasn't scaled nearly as well as transistor count on the processor cores. Once HBM becomes low enough cost in relative terms, IC will be quickly discarded for the TB/s worth of bandwidth and multiple tens of GB memory capacity that HBM can offer.
I don’t think IC is a stop gap solution… nVidia and AMD are trying to put on die memory for so long that I do believe the future all the memory will be stacked in the GPU package like IC with more layers.
 
Last edited:
I don’t think IC is a stop gap solution… nVidia and AMD are trying to put on die memory for so long that I do believe the future all the memory will be stacked in the GPU package like IC with more layers.
The cost of stacking 64GB of DDR6 on top of a GPU die is prohibitive. This is also why HBM died in the consumer space.

What Apple is doing with M1 is closer to what we'll see going forward in terms of integrating memory closer to SoC or CPU and GPU dies.
 
Damn this shit is moving very fast. Didn't think 24 Gbps would happen until GDDR7, and that is probably for 2023 - 2024.

Dunno if RDNA3 will bother with it though, actually. AMD is trending with lower main memory bandwidth offset by a lot of Infinity Cache on-die. Which for their design works out better as you have more data closer to the chip in a (relatively, for SRAM) large pool of memory with higher bandwidth and lower latency than GDDR6/6X etc.

Or, maybe they will use it, but in a way to get good bandwidth on VRAM and simplifying the memory controller set-up; i.e instead of needing 256-bit interface with 8x 16 Gbps GDDR6 chips for 512 GB/s, you can do 192-bit interface with 6x chips and get 576 GB/s, or you can just clock the chips a bit lower to get 512 GB/s.

You lose some capacity (12 GB vs. 16 GB), but have a simpler memory controller setup and a smaller card. It could work for a few different designs in the mid-end maybe.

Not for future GPU multiple times as fast as the current top end.

There's only so much SRAM you can fit on a die and as we go lower than 5nm costs will balloon further, meaning even less incentive to have large on-die caches.

IC is effectively a stop-gap solution, necessitated because memory bandwidth hasn't scaled nearly as well as transistor count on the processor cores. Once HBM becomes low enough cost in relative terms, IC will be quickly discarded for the TB/s worth of bandwidth and multiple tens of GB memory capacity that HBM can offer.

Possibly. But if IC is already doing so well for RDNA 2 in rasterization workloads to keep up (if not beat) Ampere in those, why get rid of it? They'll never fully discard IC IMO, and some of the patents and documentation (at least from sources I've read on talking about them) for V-Cache makes it sound like they will be implementing that into the GPU side as well, so they clearly see it as part of their long-term solution.

HBM prices will have to come down a bit more before we really see it back in commercial, consumer GPU designs, and even then it depends on which HBM we're talking about. HBM3 IIRC already has some samples testing at 6 Gbps or 7 Gbps, but you can bet those chips are going to go for a premium, too much a premium for even high-end consumer GPUs for the next 3-4 years I'd say. Plus while HBM has lower latency and higher bandwidth than GDDR, its latency still can't beat SRAM cache's; even supposing IC is the slowest of the cache on an AMD GPU, latency will still be better vs. HBM and bandwidth will be better when costs are taken into account (you'd need a good few HBM3 8-Hi stacks to match the bandwidth of a couple hundred MB's worth of IC by RDNA 4 gen).

IMHO, IC isn't going anywhere; memory trends have been going towards moving things closer on-chip anyway, so you end up with an arguably worst design ridding of IC and replacing both it and GDDR with just HBM Gen 3 or whatever. But if you balance out the IC capacity with what's a reasonable amount of HBM3, cut out GDDR altogether, you get a very capable design that isn't compromising out of a self-inflicted wound (or compromising at all aside from potentially IC &VRAM capacity, to stay within a certain budget).
 
Last edited:
I don’t think IC is a stop gap solution… nVidia and AMD are trying to put on die memory for so long that I do believe the future all the memory will be stacked in the GPU package like IC with more layers.

IC in its current application totally is. There's really no other option for AMD. If HBM was cheap enough, or GDDR7 was ready, you probably wouldn't have seen IC at all.

That's not to say IC won't have any utility when HBM does eventually become economical. The move to chiplet-based GPU almost certainly necessitates the use of an IC-like solution for inter-chiplet communication.

Possibly. But if IC is already doing so well for RDNA 2 in rasterization workloads to keep up (if not beat) Ampere in those, why get rid of it? They'll never fully discard IC IMO, and some of the patents and documentation (at least from sources I've read on talking about them) for V-Cache makes it sound like they will be implementing that into the GPU side as well, so they clearly see it as part of their long-term solution.

HBM prices will have to come down a bit more before we really see it back in commercial, consumer GPU designs, and even then it depends on which HBM we're talking about. HBM3 IIRC already has some samples testing at 6 Gbps or 7 Gbps, but you can bet those chips are going to go for a premium, too much a premium for even high-end consumer GPUs for the next 3-4 years I'd say. Plus while HBM has lower latency and higher bandwidth than GDDR, its latency still can't beat SRAM cache's; even supposing IC is the slowest of the cache on an AMD GPU, latency will still be better vs. HBM and bandwidth will be better when costs are taken into account (you'd need a good few HBM3 8-Hi stacks to match the bandwidth of a couple hundred MB's worth of IC by RDNA 4 gen).

IMHO, IC isn't going anywhere; memory trends have been going towards moving things closer on-chip anyway, so you end up with an arguably worst design ridding of IC and replacing both it and GDDR with just HBM Gen 3 or whatever. But if you balance out the IC capacity with what's a reasonable amount of HBM3, cut out GDDR altogether, you get a very capable design that isn't compromising out of a self-inflicted wound (or compromising at all aside from potentially IC &VRAM capacity, to stay within a certain budget).

You misunderstand. I'm not arguing that IC will disappear entirely. It just won't be the primary solution focused on meeting the insane bandwidth requirements of next-gen GPU chips when HBM is ripe and ready for prime time.

As I mentioned above, IC or at least a derivative of the technology is necessary for the move from monolithic GPU dice to chiplet-based GPUs, to facilitate the inter-chiplet comms. Still, it'll be scaled down. On-die SRAM isn't free, and the die area footprint costs to include a sufficient amount to actually be useful on bleeding edge sub-5nm nodes could far exceed the cost of a reasonable HBM stack + packaging. A small, shared, last-level, fabric-integrated cache for inter-chiplet comms plus HBM is the likely solution.

The effectiveness of putting only a piddly few MBs of IC on the die for off-chip comms drops off a cliff as GPUs get bigger and their rendering workloads balloon. GPUs are, as you know, designed to be latency tolerant, so the additional latency cost of going off-die to HBM is not going to kill performance provided the off-die memory can provide sufficient bandwidth (and HBM certainly can). And the move to a chiplet-based design will mean your GPU chiplets have to pay that off-die latency cost for any synchronisation comms anyway, so it's just an inherent inefficiency you'll have to live with.
 

Drew1440

Member
IC in its current application totally is. There's really no other option for AMD. If HBM was cheap enough, or GDDR7 was ready, you probably wouldn't have seen IC at all.

That's not to say IC won't have any utility when HBM does eventually become economical. The move to chiplet-based GPU almost certainly necessitates the use of an IC-like solution for inter-chiplet communication.



You misunderstand. I'm not arguing that IC will disappear entirely. It just won't be the primary solution focused on meeting the insane bandwidth requirements of next-gen GPU chips when HBM is ripe and ready for prime time.

As I mentioned above, IC or at least a derivative of the technology is necessary for the move from monolithic GPU dice to chiplet-based GPUs, to facilitate the inter-chiplet comms. Still, it'll be scaled down. On-die SRAM isn't free, and the die area footprint costs to include a sufficient amount to actually be useful on bleeding edge sub-5nm nodes could far exceed the cost of a reasonable HBM stack + packaging. A small, shared, last-level, fabric-integrated cache for inter-chiplet comms plus HBM is the likely solution.

The effectiveness of putting only a piddly few MBs of IC on the die for off-chip comms drops off a cliff as GPUs get bigger and their rendering workloads balloon. GPUs are, as you know, designed to be latency tolerant, so the additional latency cost of going off-die to HBM is not going to kill performance provided the off-die memory can provide sufficient bandwidth (and HBM certainly can). And the move to a chiplet-based design will mean your GPU chiplets have to pay that off-die latency cost for any synchronisation comms anyway, so it's just an inherent inefficiency you'll have to live with.
Interesting, could we see gpu's with multiple memory pools with both HBM2 & GDDR6, similar to what the Xbox 360 had with its eDRAM?
 

Bo_Hazem

Banned
Tech seems to be evolving rapidly in the last 1-2 years. All good to wait for PCIe 5.0 and take the greatest possible upgrade.
 
You misunderstand. I'm not arguing that IC will disappear entirely. It just won't be the primary solution focused on meeting the insane bandwidth requirements of next-gen GPU chips when HBM is ripe and ready for prime time.

As I mentioned above, IC or at least a derivative of the technology is necessary for the move from monolithic GPU dice to chiplet-based GPUs, to facilitate the inter-chiplet comms. Still, it'll be scaled down. On-die SRAM isn't free, and the die area footprint costs to include a sufficient amount to actually be useful on bleeding edge sub-5nm nodes could far exceed the cost of a reasonable HBM stack + packaging. A small, shared, last-level, fabric-integrated cache for inter-chiplet comms plus HBM is the likely solution.

The effectiveness of putting only a piddly few MBs of IC on the die for off-chip comms drops off a cliff as GPUs get bigger and their rendering workloads balloon. GPUs are, as you know, designed to be latency tolerant, so the additional latency cost of going off-die to HBM is not going to kill performance provided the off-die memory can provide sufficient bandwidth (and HBM certainly can). And the move to a chiplet-based design will mean your GPU chiplets have to pay that off-die latency cost for any synchronisation comms anyway, so it's just an inherent inefficiency you'll have to live with.

Okay I see where you're coming from on this, now. Briefly sounded like IC would be removed altogether, but proposing there is still some present (in moderation) balanced out with a good deal of HBM, hopefully by the 2024-2025 period we'll start to see HBM3 not only available but at good enough prices and volumes to supplement GDDR on at least the higher-end consumer GPUs, and for AMD-based ones we could be looking at 128 MB - 192 MB or upwards 256 MB of IC (either as pure IC or IC & V-cache unless V-cache is just another phrasing for IC).

I don't think HBM3 prices will be low enough to ensure the low and mid-gen cards get it though, but at least those would still have something like IC, if in lower capacity, to offset them using GDDR6X or GDDR7 in lieu of HBM3.
 
I mean... I am only 32 but I never really noticed any difference in ram/memory aside from it's quantity.
That one time in my whole life I had to upgrade from 128 to 512 so battle for middle earth would not stutter
Memory bandwidth matters - especially for graphics performance.
That's why AMD went through tremendous expense to create HBM, and Nvidia went through so much expense to make GDDR6X.

Just because you haven't noticed or understood what's happening, doesn't mean this entire thread deserves a flippant response.

The cost of stacking 64GB of DDR6 on top of a GPU die is prohibitive. This is also why HBM died in the consumer space.
HBM is not the same as stacked DRAM, which is not the same as DDRx RAM.

The reason stacked memory or cache (which in AMD's case is actually SRAM) will take off, is due to the move to MCM GPUs in the future. Note this has already begun with Instinct MI250X.
The interposer complexity will be mandatory to have MCM GPU's in the first place, so adding additional SRAM tiles to act as a last level cache for graphics might increase costs a bit, but not by orders of magnitude, as its already going to be expensive by default.

What Apple is doing with M1 is closer to what we'll see going forward in terms of integrating memory closer to SoC or CPU and GPU dies.

Apple is doing absolutely nothing special at all. The only thing that is unique to them, is the fact that they've got a 512-bit LPDDR5 (8 channel memory) bus on their massive SOC.
Its no different to any other Dual-channel LPDDR Laptop RAM configuration, beyond the sheer width of the memory interface. No unique technology whatsoever.
 
Last edited:
Interesting, could we see gpu's with multiple memory pools with both HBM2 & GDDR6, similar to what the Xbox 360 had with its eDRAM?

Not HBM combined with GDDR at least. That would be entirely redundant. HBM will never be cheap in absolute terms, and GDDR isn't cheap either, so combining them both on the same GPU when they both serve the exact same function is unnecessary.

I think you'll see combinations of last-level caches, e.g. Infinity Cache, together with HBM/GDDR6+. And then eventually, you'll see fast, low latency, non-volatile memory solutions emerging, like Re-RAM and 3D Xpoint phase-change memory that will fit between memory and the SSD.
 
Last edited:

V4skunk

Banned
Years back, I remember HBM and charts saying how revolutionary it is.

Havent heard anything about it in a long time.
I'm sure I read recently that Samsung has made developments with hbm memory and the article was speculating on how it could be viable for high end mobile phones in a few years.
 
Top Bottom