• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS3 Cell made Sonys first party what they are today.

PaintTinJr

Member
Stop misrepresenting what I say.
My point was that the SPE is a precursor to the modern CU. Sony even had the intention of having just the Cell to render graphics, before realizing the issues with yields.
Had Sony managed to get decent yields on the Cell, those SPEs would have been very similar in concept, to what the CUs we have today in a console SoC.
Your point that Cell led to Alder lake is complete non-sense. Cell was a dead end, that no one copied since.
And just because Alder Lake has one similarity in a very broad technical term, it does not mean it's related to Cell.
And this seems to distil our different views, because even now - as has been the case of needing a special CU for the tempest engine in the PS5, or special Co-processors in the IO complex - the SPUs can subsume all those algorithms and more with custom software solutions. It was the ideal type of processor to send on a long space mission, where what you needed 5years after launch from a CU/ASIC of the time wasn't right anymore, and you needed DSP/GPU level performance with CPU versatility but at the expense of leaning on human programming ingenuity, which was gained in those flight years. So being a precursor to something less versatile and less general purpose isn't how I would describe the SPUs, even if I see what you are driving at.
..The page you posted has only one mention of hUMA and it refers to AMD's Heterogeneous Unified Memory Access. You continue to use the wrong term for Heterogeneous Computing.
I'm not arguing that I can locate the resources from nearly 20years ago to prove I'm right(switch from http to https makes old info sparse), and clearly you don't want to accept that the Cell BE's EiB and flexIO were designed for hUMA operation, so we reach an impasse - I could try and point you to it being a ringbus with a token access mechanism, and why such a uncommon communication topology for the processor fits the Unified Memory Access paradigm like a glove, but you seem like, unless I can use a time-machine to get the pages I've read/seen to show you, you aren't interested.

I had hoped that me enlightening you on issues like the Cell BEs (RoadRunner) power efficiency and the nature of SPUs being autonomous after being kicked off, and SPUs - split across different Cell BE processor like they are in the Sony Zegos, fixstar boards, etc - being able to access unified XDR with equal priority via the EiB ringbus would have given me a bit good will from you, for you to trust that what I was recounting from the time was correct.. but sadly not it seems.
 

winjer

Member
And this seems to distil our different views, because even now - as has been the case of needing a special CU for the tempest engine in the PS5, or special Co-processors in the IO complex - the SPUs can subsume all those algorithms and more with custom software solutions. It was the ideal type of processor to send on a long space mission, where what you needed 5years after launch from a CU/ASIC of the time wasn't right anymore, and you needed DSP/GPU level performance with CPU versatility but at the expense of leaning on human programming ingenuity, which was gained in those flight years. So being a precursor to something less versatile and less general purpose isn't how I would describe the SPUs, even if I see what you are driving at.

The Cell arch made a tradeoff. By removing things like the branch predictor from it's SPEs and by having a mere In-Order arch, it saved a lot of space to be used on more computational power.
But this put the onus of optimization squarely on the developers side. Yes, there were a few examples with impressive use of the Cell CPU. But most devs had no time, nor budget, nor technical know how.
The Cell is great at parallel workloads with small dependencies. But because it's lacking an OoO and only the PPE has branching capabilities, it became very difficult to maintain the execution pipeline full, especially running code with high dependencies and as branching, as games are.
SMT probably helped to assign work to unused stages at any given point. And a good compiler also helps. But these things only go so far in games.
I remember when Cell dominated the Folding@Home rankings, due to it's code being very parallel and low dependencies.
But in games it never pulled away that far from the competition.

I'm not arguing that I can locate the resources from nearly 20years ago to prove I'm right(switch from http to https makes old info sparse), and clearly you don't want to accept that the Cell BE's EiB and flexIO were designed for hUMA operation, so we reach an impasse - I could try and point you to it being a ringbus with a token access mechanism, and why such a uncommon communication topology for the processor fits the Unified Memory Access paradigm like a glove, but you seem like, unless I can use a time-machine to get the pages I've read/seen to show you, you aren't interested.

I had hoped that me enlightening you on issues like the Cell BEs (RoadRunner) power efficiency and the nature of SPUs being autonomous after being kicked off, and SPUs - split across different Cell BE processor like they are in the Sony Zegos, fixstar boards, etc - being able to access unified XDR with equal priority via the EiB ringbus would have given me a bit good will from you, for you to trust that what I was recounting from the time was correct.. but sadly not it seems.

The issue I'm point at is that hUMA is not the same as Heterogeneous Computing.
At best, it's a subset of this definition, but focused on memory interfaces.
 
Keep in mind also that AAA games (and games in general) were made using 512MB of RAM. To make it worse:

It's not unified memory: 256 MB main RAM and 256 MB VRAM. To make it worse further:

The OS footprint when the console was first released hogged,- I believe close to a 100MB of RAM, then a few years later the OS footprint was slashed by 74MB.

Developers always complaining they dont have enough RAM. Waahhh cry babies. Dare you to make a Gran Turismo game 60fps on less than 512 MB of RAM!!
 

mckmas8808

Mckmaster uses MasterCard to buy Slave drives
As everyone knows by now the infamous Cell CPU in the PS3 was a really hard and time consuming to code. There is a YT video by Modern Vintage Gamer who goes into detail about what was involved. The amount of code that was required to just send one command was alot more than typical core would use.

We saw just how this effected the multiplatform games that was released on PS which ran alot worse on the PS3 than the 360 for the majority of the generation.
In response to the trouble developers were having with the Cell Sony put alot of effort into the ICE teams to get the absolute best tools for taking advantage of the Cell and help development of third party games on the platform. From my understanding the ICE team was taken from the Sony first party teams such as Naughty Dog, GG and Santa Monica Studios.
By the end of the generation Sony's internal teams were putting out games that were amongst the most impressive of the generation.
Each Sony studio developed their own internal game engines, built from the ground up to take advantage of the parallel processing that the Cell offered.
As a result their current and recent projects are extremely well coded and efficient on multicore processors and their engines keep up with the best of them including Idtech and Unreal Engine.
The hard graft that these studios had to do when stuck with the Cell has given them a skill set and coding tools that are benefiting them today.

As someone who loves the tech side of things I wonder if Sony had of stuck with the Cell and improved its shortcomings like making it Out of order, streamlining the command requirements what it could have been. No doubt it would have been more powerful than the jaguar cores in the PS4.

While I understand why both Sony and MS moved to PC parts for their new consoles, I really miss the days of proprietary processors from Sony, Sega etc.

This is my first thread on GAF, so go easy on me.

You think you miss the bolded........but trust me you don't!!!
 

damiank

Neo Member
I guess there is a big reason why nVidia only made one console for MS and only one for Sony.
But AMD/ATI already made several of them. Being a good partner in venture like this, is very important.
Nintendo Switched from ArtX/AMD (GC->Wii->WiiU) to Nvidia. I wonder if Nintendo Switch again to AMD as Steam Deck is emulating Switch just fine.

Also, that hypotetical Cell 2 for PS4 could be PowerXCell based with architecture upgrades but in 4+8 combination instead of 1+8. With Radeon from actual PS4 and 8GB of RAM, devs could use their existing 360 engines and optional stuff from PS3 that could be used on SPE's to offload GPU here and there. But that's just hypotetical. Also Sony itself could use their existing PSone and PStwo emus, moving PS2 Classics emu to full-blown emulator with support for every PS2 controllers and accesories (as actual PS2_netemu can't do). For PS5 it could be 8+8 combination and end of the line for this crap XD.
 
Last edited:

Fafalada

Fafracer forever
The Cell arch made a tradeoff. By removing things like the branch predictor from it's SPEs and by having a mere In-Order arch, it saved a lot of space to be used on more computational power.
SPEs don't have branch predictor by design - it wasn't 'removed' - as I pointed out earlier, it really didn't need one.
But the rest is not really specific to Cell or the SPEs. From 1995 - 2012, 8 out of 10 consoles released had CPUs (and other processors) use in-order architecture. 12 out 15 if counting handhelds. It's only in the most recent decade that we've finally seen the switch to OOOE proper.
So yes - it was a trade-off, but it was one that almost every console made in pursuit of power-efficiency.
 
I can't understand where this narrative that they stumbled in the last few years is coming from. It's been hit after hit pretty much, not only since the console released but even in the moths that preceded the release. Critically and commercially successful games, sold out console, huge service numbers, etc.

The last R&C was likely more successful then ever, R&C games didn't use to get this much attention or promotion before. GT7 did exceptionally well as well. Returnal did great for a game like that.

You seem to be underselling a lot of things, Horizon was a massively successful new IP and I think it's way to soon to assume the sequel didn't do as well as they hoped for. Releasing close to Elden Ring didn't help but it's nothing that can't be overcome over time.

TLoU2 is probably the only recent game that might've sold bellow expectations (given that Sony doesn't update the numbers) but even that was still massively successful anyway and the multiplayer component was not even released yet.

Horizon clearly hasn't sold as well as its predecessor.

It nose-dived on the charts immediately. The only question is whether it can have legs as the drought of big releases continues.
 
To provide some context, during Cell development, their budget was ~O(200M transistors). When you look back and compare it to PS5, we're approaching 20,000M transistors. So the design paradigm to fit in the performance envelope is quite a bit different and when you actually do the regression analysises and look at prediction and OOOE and branching, etc these things aren't winners. Especially coming off PS2, Cell was a more approachable design. Hofstee was outspoken about Cell being easier to approach.


For what's worth - original targets had Cell 4x more powerful, and the GPU substantially faster at non-compute workloads (but not much else) so what we got was not even all that exotic in the end.

Agreed. There is some ambiguity in what they had planned, but there was early significant interest in having a 1TFlop/sec Cell processor around the time the patents were drawn up. It's likely they had a more aggressive lithography roadmap in mind -- remember they were already producing EE+GS's on CMOS4 @90nm in FY2004. And then there's IBM's influences are a mixed-bag, 90nm SOI, a guTS/Rivina-derived PPE, the EIB was an elegant solution over a cross-bar considering the performance/space trad-offs.


I remember when Cell dominated the Folding@Home rankings, due to it's code being very parallel and low dependencies.
But in games it never pulled away that far from the competition.

It was never a fair competition. Use the same GPU and tell us the games would be remotely close if the only variable was Cell verse Xenon. Once they moved from the Toshiba design, they lost the memory architecture and programming paradigm that would have been interesting. It wasn't even a G8x class GPU. As Jim Kahle has admitted, the nVidia interconnect 'problem' was a late-addition for them.

Interesting thought experiment: If PS3 had a similar caliber GPU, it's easy to imagine as they were discrete back then, so that was now negligible to developers, what do you think they could have done with ~200GFlop/sec of sustained and pretty general FP computation?
 
Horizon clearly hasn't sold as well as its predecessor.

It nose-dived on the charts immediately. The only question is whether it can have legs as the drought of big releases continues.
If you say so, let's see how long it takes Sony to make the numbers public.

If they take too long to talk about it than it is likely it might have underperformed (but there is no doubt that it sold extremely well regardless, it's a very successful IP already).
 
Last edited:

SlimySnake

Member
If you say so, let's see how long it takes Sony to make the numbers public.

If they take too long to talk about it than it is likely it might have underperformed (but there is no doubt that it sold extremely well regardless, it's a very successful IP already).
The first one sold what 2.7 million copies in the first month and 20 million overall? You would expect it to shatter on those numbers, especially considering its cross gen. I would've expected them to release those numbers by now.

Whats even more weird is that we havent seen GT7 numbers either.

We know Demon Souls only sold 1.2 million in the first year. Ratchet 1 million in in the first month. Returnal only 500k in the first three months. The new DLC only has 10,000 players on the leaderboards apparently. Whats up with these 1:20 attach ratios for a console so popular?

Spiderman Miles is pretty much the only PS5 game that continues to chart every month. So everyone who picks up a PS5 buys Miles and nothing else. why? Could it be because its only $50 whereas everything else is $70? Last gen Sony's first party had an incredible run of games selling way better than their predecessors. KZ2 sold 2.1 million in 6 weeks. INfamous and BB sold 1 million in a month. Even DriveClub sold 2 million in 8 months. Then Uncharted 4, Horizon, GOW, Spiderman, Days Gone, TLOU2 and Ghosts just continued to outsell each other. Massive first week sales. Incredible legs. It was insane. Then PS5 hits and they all barely sell a million? Whats going on here?
 
Last edited:
The first one sold what 2.7 million copies in the first month and 20 million overall? It's definitely curious that a game that sold 20 million wouldnt sell more than 2.7 million in its first three months.

Whats even more weird is that we havent seen GT7 numbers either.

We know Demon Souls only sold 1.2 million in the first year. Ratchet 1 million in in the first month. Returnal only 500k in the first three months. The new DLC only has 10,000 players on the leaderboards apparently. Whats up with these 1:20 attach ratios for a console so popular?

Spiderman Miles is pretty much the only PS5 game that continues to chart every month. So everyone who picks up a PS5 buys Miles and nothing else. why? Could it be because its only $50 whereas everything else is $70? Last gen Sony's first party had an incredible run of games selling way better than their predecessors. KZ2 sold 2.1 million in 6 weeks. INfamous and BB sold 1 million in a month. Even DriveClub sold 2 million in 8 months. Then Uncharted 4, Horizon, GOW, Spiderman, Days Gone, TLOU2 and Ghosts just continued to outsell each other. Massive first week sales. Incredible legs. It was insane. Then PS5 hits and they all barely sell a million? Whats going on here?
Wait, do you know how much Horizon Forbidden West sold? I don't and I don't know how someone could tell from just looking at position on charts.

GT7 just released, they said it had the best launch of the franchise if I'm not mistaken, that alone is great news.

You honestly think Returnal is a flop? Sony even bought the studio after the game released, they clearly liked what they saw, it's not like there were big expectations for the game.

Honestly you seem to be jumping to conclusions way too soon.
 
Last edited:

LordOfChaos

Member
As someone who loves the tech side of things I wonder if Sony had of stuck with the Cell and improved its shortcomings like making it Out of order, streamlining the command requirements what it could have been. No doubt it would have been more powerful than the jaguar cores in the PS4.


Cell was in the end a pre-GPGPU design that aimed to make CPUs much more SIMDy per transistor, and when taken advantage of it did deliver that, with SIMD flops unparalleled for years after its release. However, GPUs quickly became better at doing what it did better than general CPUs.


Nowadays if you want to rain a bunch of particles with real physics on the screen for instance, you want to do that on the GPU.


 
Cell was in the end a pre-GPGPU design that aimed to make CPUs much more SIMDy per transistor, and when taken advantage of it did deliver that, with SIMD flops unparalleled for years after its release. However, GPUs quickly became better at doing what it did better than general CPUs.

The obvious disclaimer is that, you're right, there has been an alignment of computation to dedicated substrate that has made Cell less needed today. But, consider:


I would suggest it was a potential platform, an architecture, that could have been extended and yielded interesting benefits.

If Sony's economics weren't a factor, PlayStation3 dedicated 258mm^2 (RSX) and 235mm^2 (Cell) to computation on their platform (~500mm^2 total). We're now getting ~300mm^2 with PlayStation5. Yet, we praise Cerny.

If we still had that area, we'd be talking the mid-point between an RTX 2070 and 2080, so on-the-order of 10B transistors. 75% of that dedicated to graphics, the other 3B is free.

But, while on the Cell theme -- there were plans to extend the design in width (4 PPEs) and length (upto 32 SPEs). Obviously the PPEs wouldn't be the same design, we can afford to use a more accommodating Power core. And there was talk that a SPE didn't have to be as we saw in Cell. They discussed having an APU that was basically additional PPEs. So, you would have a heterogeneous processing environment linked by the EIB (which could be replaced by a X-bar, too) that could be tailor made.

Would this have a niche and find utility in today's computational landscape? Maybe, maybe not. But it would be a lot more interesting from an theoretical standpoint than what Cerny has given us.
 

PaintTinJr

Member
To provide some context, during Cell development, their budget was ~O(200M transistors). When you look back and compare it to PS5, we're approaching 20,000M transistors. So the design paradigm to fit in the performance envelope is quite a bit different and when you actually do the regression analysises and look at prediction and OOOE and branching, etc these things aren't winners. Especially coming off PS2, Cell was a more approachable design. Hofstee was outspoken about Cell being easier to approach.




Agreed. There is some ambiguity in what they had planned, but there was early significant interest in having a 1TFlop/sec Cell processor around the time the patents were drawn up. It's likely they had a more aggressive lithography roadmap in mind -- remember they were already producing EE+GS's on CMOS4 @90nm in FY2004. And then there's IBM's influences are a mixed-bag, 90nm SOI, a guTS/Rivina-derived PPE, the EIB was an elegant solution over a cross-bar considering the performance/space trad-offs.




It was never a fair competition. Use the same GPU and tell us the games would be remotely close if the only variable was Cell verse Xenon. Once they moved from the Toshiba design, they lost the memory architecture and programming paradigm that would have been interesting. It wasn't even a G8x class GPU. As Jim Kahle has admitted, the nVidia interconnect 'problem' was a late-addition for them.

Interesting thought experiment: If PS3 had a similar caliber GPU, it's easy to imagine as they were discrete back then, so that was now negligible to developers, what do you think they could have done with ~200GFlop/sec of sustained and pretty general FP computation?
As much as I agree with your wider points about Cell BE, the shoehorned 11th hour RSX was still more capable than the Xenos by about +40% more quads on screen - as the headline figure - with optimised wound quadrilateral strip geometry (1.1 Billion polys/vertices versus about 400-700M polygons going by ATI 980 Pro/X1600(?) tech specs + a boost).

The 360 had the same technical nightmare first year as the PS3, just it had no competition via the Ps2 to make its screen-tearing, frame pacing without fullscreen AA or falling below 1280x720 native an issue that we all remember, well - unlike the PS3 - and because the ATI Xenos had Micro Architecture fast path features such as EarlyZ, non-optimised screen processing favoured the 360. Polygon processing on the xenos was almost as fast as optimised wound quad strips - probably having some batching in hardware so the one extra vert for an extra polygon was automatic but still constrained by the max polygon count. Unlike on nvidia cards where you could get an extra polygon per additional vert, so the difference between optimised and not, was 3 verts per polygon, or 1 vert per polygon, altering the max polygon throughput from 340M polygons/sec to 1billion polygons/sec.

Then you had the RSX supported HD ready and Full HD triple buffering with full precision zbuffering with hardware accelerated sRGB gamma correction - proper Standard dynamic Range Colour gamut - and supported 10 bit per channel RGB framebuffers, etc.

The only real weaknesses was the alpha blending couldn't match the Xenos from its higher performance but too small edram, ,and because the xenon and Xenos shared unified ram, the 256MB of GDDR3 and 256MB of XDR in the PS3 became another problem to solve moving data around. The ATI EarlyZ feature also provided an early out for shading too IIRC - like a precursor to variable rate shading - which was an additional automatic saving over the RSX for non-optimised rendering.

Overall the RSX was stronger, but even it was far more work for equal results (in verts + fragment throughput) compared to the competition.
 
Last edited:

LordOfChaos

Member
The obvious disclaimer is that, you're right, there has been an alignment of computation to dedicated substrate that has made Cell less needed today. But, consider:


I would suggest it was a potential platform, an architecture, that could have been extended and yielded interesting benefits.

If Sony's economics weren't a factor, PlayStation3 dedicated 258mm^2 (RSX) and 235mm^2 (Cell) to computation on their platform (~500mm^2 total). We're now getting ~300mm^2 with PlayStation5. Yet, we praise Cerny.

If we still had that area, we'd be talking the mid-point between an RTX 2070 and 2080, so on-the-order of 10B transistors. 75% of that dedicated to graphics, the other 3B is free.

But, while on the Cell theme -- there were plans to extend the design in width (4 PPEs) and length (upto 32 SPEs). Obviously the PPEs wouldn't be the same design, we can afford to use a more accommodating Power core. And there was talk that a SPE didn't have to be as we saw in Cell. They discussed having an APU that was basically additional PPEs. So, you would have a heterogeneous processing environment linked by the EIB (which could be replaced by a X-bar, too) that could be tailor made.

Would this have a niche and find utility in today's computational landscape? Maybe, maybe not. But it would be a lot more interesting from an theoretical standpoint than what Cerny has given us.


Absolutely no denying that the 7th gen was a lot more interesting than what we have now, given that we're still talking about and dissecting Cell to this day.

But it must be said that the get up and get going utility of PC-like hardware today is a big boon to developers. As far as the argument that more transistors thrown at the problem would yield more power, of course yeah, but that is a matter of them shifting away from the lossy 7th gen's battle tanks going at each other to more hybrid sedans, not a mark on the architecture or Cerny.

Imo this argument also had more appeal in the 8th gen. The Jaguar CPUs were single core dogs and not impressive even using all 7 (available) cores for SIMD. I would definitely like to see the what-if universe simulation of an extended Cell 2 with 4PPEs/32SPEs in there, but I've not been unhappy with the 9th gen move to easily accessible, 7 core Zen 2 power, with unified memory making GPGPU even more viable than standalone PC cards and likely subsuming most of what a theoretical Cell could have done here.


Actually before it launched I was quite in like with the idea of a "Cell assist engine", a full Cell processor in the PS3 for BC that developers could also tap for PS5 titles however they wanted, but alas it was only fantasy. 230 million transistors would be a pretty small addition.
 
Last edited:
Overall the RSX was stronger, but even it was far more work for equal results (in verts + fragment throughput) compared to the competition.

Sony, Toshiba and IBM formed STI in 2001 for a product in 2005. If a similarly successful project with Toshiba worked out, things would have been very different. I would even suggest that if a similar period of time was invested with nVidia, we could have seen a G8x derivative that didn't have the shitty memory, bandwidth or command processor issues and would have made these discussions moot. You make great arguments, but lets be honest, it was a hack job...


Absolutely no denying that the 7th gen was a lot more interesting than what we have now, given that we're still talking about and dissecting Cell to this day.

But it must be said that the get up and get going utility of PC-like hardware today is a big boon to developers. As far as the argument that more transistors thrown at the problem would yield more power, of course yeah, but that is a matter of them shifting away from the lossy 7th gen's battle tanks going at each other to more hybrid sedans, not a mark on the architecture or Cerny.

I agree! This discussion held more water in the 8th generation.

Also, Sony's high-dimensional design space is overwhelmingly dominated by economic concerns, which I have the luxury of not paying attention to.

PS. Perhaps Panajev remembers this, but I think there was a years later, post-launch interview with Hofstee or someone that said in hindsight, if they realized OpenCL would come about they would have developed a processor dedicated to that?! Sorry, I'm getting old and have forgotten so much, I work on computation that is much wetter now....
 
Last edited:
No, it was garbage and no amount of nostalgia will change that.
Yeah it wasn't suited to a games console and in hindsight Sony would choose a different path.
However it made Sony's studios mad good at parallel Compute and their engines are really efficient at it. With Sony first party games looking so good, rarely do they have major performance issues related to poor coding.
They tend to get every bit of performance and efficiency out of their hardware. A skill born out of fire lol
 
Last edited:

LordOfChaos

Member
Member the PowerXCell 8i? This is two of them in an IBM Blade



'Member the SpursEngine accelerator cards?



There was even a few laptops with Cell accelerators



I 'member
 
Top Bottom