Highlights
- Meta will perhaps grab the title of building ‘Fastest AI system’ in the world, the work for which is about to finish by mid-2022.
- The new RSC will handle computer vision with 20 times faster speed and train NLP models three times faster.
Meta, formerly known as Facebook, has developed a huge Artificial Intelligence (AI)-based supercomputer, which it claims will become the fastest AI system in the world once fully developed by mid-2022.
The AI Research SuperCluster (RSC) is currently used to train large models in Natural Language Processing (NLP) and computer vision for research. The company said that it plans to “one day” train models with so many parameters and work on new AI systems that can power real-time voice translations to big groups of people.
According to a Blogpost by Nvidia, the two companies work together to build the Air Research SuperCluster or RSC, which will be the largest NVIDIA DGX A100 customer system yet. The beast is said to deliver five exaflops or 5000000 teraflops of AI performance across thousands of GPUs. According to Meta, the supercomputer development was delayed due to remote working and chip and component supply chain constraints caused by the Covid-19 pandemic.
It is not a new invention by the company. Back in 2017, Meta’s Facebook AI Research Lab had built a supercomputer with 22,000 Nvidia V100 Tensor Core GPUs in a single cluster. It was a milestone as it performed 35,000 training jobs within a day and served as the company’s main AI supercomputer.
More recently, in 2020, Facebook decided to increase its computing power, build a new supercomputer to tackle advanced-level AI workloads. The current RSC system comprises 760 Nvidia DGX A100 systems, and each includes eight A100 GPUs and two CPUs.
The 6,080 GPUs are connected via an Nvidia Quantum 200 Gb/s InfiniBand two-level Clos fabric. The system has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade.
Compared with Meta’s previous system, the RSC handles computer vision workflows 20 times faster. According to internal – and unverified – benchmarks, it runs the Nvidia Collective Communication Library (NCCL) more than nine times faster, and trains large scale NLP models three times faster.
Other interesting facts
The world’s fastest AI supercomputer, which is currently housed at the Department of Energy, is called Perlmutter supercomputer. It can run at up to four exaflops of AI performance. It features 6,159 Nvidia A100 GPUs and 1,536 AMD Epyc CPUs.
Italy’s Leonardo system, which features 3,500 Intel Sapphire Rapids CPUs and 14,000 GPUs, is set to overtake Perlmutter when it launches soon.
Experts’ view
“Developing the next generation of advanced AI will require powerful new computers capable of quintillions of operations per second,” said the company in a blog. “Today, Meta is announcing that we’ve designed and built the AI Research SuperCluster (RSC) — which we believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when it’s fully built out in mid-2022. Our researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one day training models with trillions of parameters.”
“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together,” the company continued in its blog. “Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform — the metaverse, where AI-driven applications and products will play an important role.”