Every second, 1.1 million tokens, Microsoft and NVIDIA join forces to break the record for AI inference.
Microsoft announced that its Azure ND GB300v6 virtual machine has achieved a new industry record of 1.1 million tokens per second inference speed on Meta's Llama270B model. It is reported that the Azure ND GB300 virtual machine uses NVIDIA's Blackwell Ultra GPU, specifically the NVIDIA GB300NVL72 system, which includes 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs, all in a single machine architecture design. This virtual machine is optimized for inference workloads, with a 50% increase in GPU memory and a 16% increase in thermal design power. Microsoft CEO Satya Nadella stated on social media, "This achievement is the result of our long-term collaboration with NVIDIA and our expertise in running artificial intelligence at scale."
Latest

