News | Oracle Plans to Offer ‘World’s First’ Zettascale Computing Cluster

Oracle Plans to Offer ‘World’s First’ Zettascale Computing Cluster

Published by: Insights Desk Released: Sep 12, 2024

Highlights:

The B200 comprises two distinct compute modules, also known as dies, manufactured with a four-nanometer technology.
The B200 cluster will be available in the first quarter of 2025. At approximately the same time, Oracle intends to add another new infrastructure option based on Nvidia’s GB200 NVL72 system to its public cloud.

Oracle Corp. plans to offer the world’s first zettascale computing cluster through its public cloud.

The cluster will offer artificial intelligence applications up to 2.4 zettaflops of performance, as the company demonstrated at its Oracle CloudWorld conference. A trillion billion computations are performed every second on a single zettaflop. Exaflops typically measure the speed of the world’s fastest supercomputers. Compared to zettaflop, an exaflop is three orders of magnitude smaller.

The AI cluster is powered by the flagship Blackwell B200 graphics processing unit from Nvidia Corp. Customers can provide the cluster with up to 131,072 B200 chips, which is the maximum number of GPUs that Oracle plans to support, and it will achieve its peak speed of 2.4 zettaflop. That is more than three times the quantity of graphics cards found in the fastest supercomputer in the world, known as Frontier, which is used for scientific research by the US Energy Department.

The B200 comprises two distinct compute modules, also known as dies, manufactured with four-nanometer technology. They are connected by a connection that can move up to ten terabytes of data per second. The B200 also has 192 gigabytes of high-speed RAM (HBM3e) memory, bringing its total transistor count to 208 billion.

A prominent characteristic of the chip is its microscaling capability. AI models process information as floating-point numbers, which are data units made up of four to 32 bits of information each. The processing time decreases with the size of the data unit. Calculations can be accelerated by using the B200’s microscaling capabilities to compress some floating-point numbers into smaller ones.

Oracle’s B200-powered AI cluster will be compatible with two networking protocols: InfiniBand and RoCEv2, an enhanced form of Ethernet. Both systems have kernel bypass features, which let network traffic bypass some of the obstacles it typically has to cross to get where it’s going. Because of this configuration, data may reach GPUs faster, accelerating processing.

The B200 cluster will be available in the first quarter of 2025. At approximately the same time, Oracle aims to integrate another new infrastructure alternative based on Nvidia’s GB200 NVL72 system to its public cloud. This liquid-cooled computing device comes with 36 GB200 accelerators, each with a central processing unit and two B200 graphics cards.

The GB200 is compatible with Nvidia’s SHARP networking technology. To coordinate their activity, AI chips need to share data with each other on a regular basis over the network that connects them. This kind of data movement uses some of the chips’ processing capacity. Because SHARP minimizes the amount of data that must be transferred over the network, fewer processing resources are needed, freeing up more GPU capability for AI applications.

“We include supercluster monitoring and management APIs to enable you to quickly query for the status of each node in the cluster, understand performance and health, and allocate nodes to different workloads, greatly enhancing availability,” Mahesh Thiagarajan, Executive Vice President of Oracle Cloud Infrastructure, stated.

Oracle is also improving the support for other CPUs in Nvidia’s product portfolio on its cloud platform. The H200 chip, which dominated Nvidia’s data center GPU lineup until the B200’s release in March, will serve as the foundation for a new cloud cluster that the firm intends to introduce later this year. Users will be able to reserve up to 65,536 H200 processors on the cluster, providing performance of 260 exaflops, or little more than a quarter of a zettaflop.

According to Thiagarajan, Oracle is also modernizing the storage architecture on its cloud to support the new AI clusters. Rapid data movement to and from storage is necessary for large-scale neural networks to perform computations.

“We will soon introduce a fully managed Lustre file service that can support dozens of terabits per second. To match the increased storage throughput, we’re increasing the OCI GPU Compute frontend network capacity from 100 Gbps in the H100 GPU-accelerated instances to 200 Gbps with H200 GPU-accelerated instances, and 400 Gbps per instance for B200 GPU and GB200 instances,” added Thiagarjan.

shifting gears: accelerating transformation for au...

shifting gears, changing lanes...

ent - kaspersky industrial cybersecurityit-ot conv...

uncovering cyber threats_ kaspersky incident analy...

proactive threat management_ insights into managed...

threat hunting – what, why and how...

why are targeted ransomware attacks so successful?...

the innovative potential of xdr...

the intelligent construction platform is here...

four ways to mitigate risks and improve project an...

four ways integrating cost and schedule improves y...

five collaboration challenges and how to fix them ...

quanttrading datamanagement by the numbers...

7 best practices you can steal...

7 実践できる 7つのベスト プラクティ...

数字でる クオンツトレーディング デ...

supercharging your quants with real-time analytics...

11 insights to help quants break through data and ...

harnessing ai: the future of business transformati...

451 research pathfinder: dokumenten- und workflow-...

ai governance: the path to responsible ai...

leveraging dynamic email content for advanced pers...

ai in market research: new possibilities, new insi...

ai ready workforce: upskilling for the ai era...

leveraging cohort analysis for data-driven insight...

mitigating credential harvesting with advanced sec...

purchase cyber insurance: why it’s essential for...

enhancing b2b strategies with personalization engi...

decoding tokenomics for strategic value creation i...

how to reduce the risk of ransomware? a detailed g...

data tokenization transforming protection and comp...

customer identity & access management: build trust...

data normalization unleashed: the key to seamless ...

application protection management in a digital era...

ai pricing strategy: the key to sustainable busine...

strategic role of dataops in optimizing the value ...

ai in business strategy: enhancing decisions & boo...

multi-touch attribution model to optimize marketin...

data center infrastructure management for operatio...

semantic data models bridging technical precision ...

clearwater acquires enfusion for usd 1.5 b in stoc...

chainalysis acquires alterya that detects financia...

moonpay acquires helio to boost crypto commerce an...

microsoft ai red team highlights safety challenges...

code intelligence unveils ai-based test agent for ...

nevermined secures usd 4 m to enable ai-to-ai agen...

microsoft corp launches the code for phi-4 small l...

contentstack acquires lytics to boost content pers...

aws plans to invest usd 11 b in new data centers...

gemini trust co. llc agrees to pay usd 5 m to sett...

anthropic pbc aims to raise up to usd 2 b at a usd...

ubicept develops new computer vision technology fo...

nxp semiconductors launched mcx l series for indus...

digital edge secures usd 1.6 b to expand data cent...

amd launches processor series for laptops to run a...

veracode acquires phylum assets to enhance supply ...

nvidia unveils project digits, a palm-sized ai sup...

thomson reuters acquires safesend in usd 600 m all...

rembrand raises usd 23 m for video content in bran...

ver.id secures usd 2.05 m to help euro firms adher...

resurgence in lockbit drives record high ransomwar...

14 interesting trends that affect innovation and t...

what is web hosting?...

data privacy best practices every business should ...

Oracle Plans to Offer ‘World’s First’ Zettascale Computing Cluster

Insights Desk

Related posts

Microsoft AI Red Team Highlights Safety Challenges...

Code Intelligence Unveils AI-based Test Agent for ...

Nevermined Secures USD 4 M to Enable AI-to-AI Agen...

Microsoft Corp Launches the Code for Phi-4 Small L...

Anthropic PBC Aims to Raise Up to USD 2 B at a USD...

Ubicept Develops New Computer Vision Technology fo...

Rembrand Raises USD 23 M for Video Content in Bran...

KoBold Metals Raises USD 527 M to Use AI to Locate...

Alibaba Cloud Slashes the Cost of LLM Access to Le...

Stepfun Raises Hundreds of Millions in Series B Fu...

Our Brands

7 実践できる 7つのベストプラクティ...

数字でるクオンツトレーディングデ...