During a tech day that was held yesterday in Sonoma California AMD for the first time demonstrated their Radeon Instinct product family. Their new flagship is most likely based the Vega GPU and it goes by the name MI25.
Raja Koduri, leader of the AMD Radeon Technologies Group himself presented the new pixel accelerator. According to him working samples of the new GPU exist since a few weeks, and AMD is still optimizing the cards. Nevertheless there were systems with running Radeon Instinct MI25 cards. On one of those systems Doom was installed, running at 4K on the Vulkan API pumping out 68 fps.
Koduri explicitly mentioned that Vega is making use of “NCU”, which leave a lot of room for new rumors. One theory come from Computerbase, saying that “NCU” could stand for “New/Next Compute Unit” and that could imply that AMD significantly overhauled the architecture.
On another note it looks like the number that’s included in the new naming scheme indicates FP32 performance. Therefore the MI25 is supposed to score 25 TFLOPs, the MI8 does 8.2 TFLOPs and the MI6 reaches 5.7 TFLOPs. Compared to Fiji this would mean that compute power has been tripled, which does sound rather unlikely. Therefore it could be that AMD is using a feature that allows to perform two FP16 calculations at the same time using an FP32 ALU. So only the GP100 GPU from NVIDIA is capable of this.
On yet another slide ti has been mentioned that there will be a High Bandwidth Cache as well as corresponding controller. Koduri didn’t comment on this, yet it’s already been officially confirmed that Vega is going to feature HBM2 memory. But what precisely the HBM cache is there for remains unclear. Apparently an HBM cache would not be as fast a L2 or a L1 cache. Regarding the HBM2 implementation one of the AMD employees unintentionally mentioned that the MI25 features 512 GB/s of bandwidth. This allows the conclusion that HBM2 is either going to be implemented using two or four stacks. Two stacks would allow for 8, 16, 32 or 64GB of VRAM, while with four stacks 16, 32, 64 or 128GB could be realized. Nevertheless 64GB (2 stacks) or 128GB (4 stacks) are unlikely, since there is no HBM2 memory available with the density that would be required for this purpose.