Highlights:
- The B200 comprises two distinct compute modules, also known as dies, manufactured with a four-nanometer technology.
- The B200 cluster will be available in the first quarter of 2025. At approximately the same time, Oracle intends to add another new infrastructure option based on Nvidia’s GB200 NVL72 system to its public cloud.
Oracle Corp. plans to offer the world’s first zettascale computing cluster through its public cloud.
The cluster will offer artificial intelligence applications up to 2.4 zettaflops of performance, as the company demonstrated at its Oracle CloudWorld conference. A trillion billion computations are performed every second on a single zettaflop. Exaflops typically measure the speed of the world’s fastest supercomputers. Compared to zettaflop, an exaflop is three orders of magnitude smaller.
The AI cluster is powered by the flagship Blackwell B200 graphics processing unit from Nvidia Corp. Customers can provide the cluster with up to 131,072 B200 chips, which is the maximum number of GPUs that Oracle plans to support, and it will achieve its peak speed of 2.4 zettaflop. That is more than three times the quantity of graphics cards found in the fastest supercomputer in the world, known as Frontier, which is used for scientific research by the US Energy Department.
The B200 comprises two distinct compute modules, also known as dies, manufactured with four-nanometer technology. They are connected by a connection that can move up to ten terabytes of data per second. The B200 also has 192 gigabytes of high-speed RAM (HBM3e) memory, bringing its total transistor count to 208 billion.
A prominent characteristic of the chip is its microscaling capability. AI models process information as floating-point numbers, which are data units made up of four to 32 bits of information each. The processing time decreases with the size of the data unit. Calculations can be accelerated by using the B200’s microscaling capabilities to compress some floating-point numbers into smaller ones.
Oracle’s B200-powered AI cluster will be compatible with two networking protocols: InfiniBand and RoCEv2, an enhanced form of Ethernet. Both systems have kernel bypass features, which let network traffic bypass some of the obstacles it typically has to cross to get where it’s going. Because of this configuration, data may reach GPUs faster, accelerating processing.
The B200 cluster will be available in the first quarter of 2025. At approximately the same time, Oracle aims to integrate another new infrastructure alternative based on Nvidia’s GB200 NVL72 system to its public cloud. This liquid-cooled computing device comes with 36 GB200 accelerators, each with a central processing unit and two B200 graphics cards.
The GB200 is compatible with Nvidia’s SHARP networking technology. To coordinate their activity, AI chips need to share data with each other on a regular basis over the network that connects them. This kind of data movement uses some of the chips’ processing capacity. Because SHARP minimizes the amount of data that must be transferred over the network, fewer processing resources are needed, freeing up more GPU capability for AI applications.
“We include supercluster monitoring and management APIs to enable you to quickly query for the status of each node in the cluster, understand performance and health, and allocate nodes to different workloads, greatly enhancing availability,” Mahesh Thiagarajan, Executive Vice President of Oracle Cloud Infrastructure, stated.
Oracle is also improving the support for other CPUs in Nvidia’s product portfolio on its cloud platform. The H200 chip, which dominated Nvidia’s data center GPU lineup until the B200’s release in March, will serve as the foundation for a new cloud cluster that the firm intends to introduce later this year. Users will be able to reserve up to 65,536 H200 processors on the cluster, providing performance of 260 exaflops, or little more than a quarter of a zettaflop.
According to Thiagarajan, Oracle is also modernizing the storage architecture on its cloud to support the new AI clusters. Rapid data movement to and from storage is necessary for large-scale neural networks to perform computations.
“We will soon introduce a fully managed Lustre file service that can support dozens of terabits per second. To match the increased storage throughput, we’re increasing the OCI GPU Compute frontend network capacity from 100 Gbps in the H100 GPU-accelerated instances to 200 Gbps with H200 GPU-accelerated instances, and 400 Gbps per instance for B200 GPU and GB200 instances,” added Thiagarjan.