As I understand, the supercomputer GH200 Nvidia designed is a combination of 256 Hopper GPU cores, along with many Arm CPU cores. So it has 256x gpu cores while the DGX H100 only has 8x. But the FP32 performance of GH200 is 230.4 TFLOPS, with H100 be 66.9 TFLOPS, about 3.44x faster. I know the hardware performance cannot be simply scaled with linear growth, and it would be difficult to realize communication and cooperation when there are more cores. But that performance increment seems incredibly small still. Have I misunderstood these figures?
Asked
Active
Viewed 323 times
2
-
https://www.theregister.com/2023/05/29/nvidia_dgx_gh200_nvlink/ – ron Sep 18 '23 at 15:51
-
In a press briefing ahead of CEO Jensen Huang's keynote, executives compared the GH200 to the biz's recently launched DGX H100 server, claiming up to 500x higher memory. However the two are nothing alike. The DGX H100 is an 8U system with dual Intel Xeons and eight H100 GPUs and about as many NICs. The DGX GH200, is a 24-rack cluster built on an all-Nvidia architecture — so not exactly comparable. The GH200 claims 144TB of unified [gpu] memory, which is it's main purpose it seems, it's not to achieve massive improvement in FP32 teraflops. – ron Sep 18 '23 at 15:55