The primary reason for making chips this big at present, is to compute LLMs. Why have separate RAM in an LLM compute chip? It doesn't matter how wide you make the bus, it'll always be a bottleneck, and source of huge inefficiency.
The Von Neumann model of compute was great back when setup of ENIAC took days, and the run-times were shorter, but that's not the case with silicon ASICS and FPGAs.
For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%. This is because it destroyed the inherent parallelism of the original hardware design.
It's time to back out of this premature optimization rabbit hole.
> The primary reason for making chips this big at present, is to compute LLMs.
It also makes a lot of sense for HPC, simulation, and other workloads that have low data locality - this enables much faster point-to-point communication than a rack full of accelerators connected via copper cables would allow. The aggregate bandwidth of the interposer is enormous compared to NICs and latencies are much lower.
> For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%.
The primary reason for making chips this big at present, is to compute LLMs. Why have separate RAM in an LLM compute chip? It doesn't matter how wide you make the bus, it'll always be a bottleneck, and source of huge inefficiency.
The Von Neumann model of compute was great back when setup of ENIAC took days, and the run-times were shorter, but that's not the case with silicon ASICS and FPGAs.
For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%. This is because it destroyed the inherent parallelism of the original hardware design.
It's time to back out of this premature optimization rabbit hole.
> The primary reason for making chips this big at present, is to compute LLMs.
It also makes a lot of sense for HPC, simulation, and other workloads that have low data locality - this enables much faster point-to-point communication than a rack full of accelerators connected via copper cables would allow. The aggregate bandwidth of the interposer is enormous compared to NICs and latencies are much lower.
> For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%.
We'll need to rewrite a lot of software for that.