Exascale computing is here — what does this new era of computing mean and what are exascale supercomputers capable of?
Exascale computing can process over a quintillion operations every second — enabling supercomputers to perform complex simulations that were previously impossible. But how does it work?
Exascale computing is the latest milestone in cutting-edge supercomputers — high-powered systems capable of processing calculations at speeds currently impossible using any other method.
Exascale supercomputers are computers that run at the exaflop scale. The prefix "exa" denotes 1 quintillion, which is 1 x 1018 — or a one with 18 zeroes after it. Flop stands for "Floating point operations per second," a type of calculation used to benchmark computers for comparison purposes.
This means that an exascale computer can process at least 1 quintillion floating-point operations every second. By comparison, most home computers operate in the teraflop range (generally around 5 teraflops), only processing around 5 trillion (5 x 1012) floating-point operations per second.
"An exaflop is a billion billion operations per second. You can solve problems at either a much larger scale, such as a whole planet simulation, or you can do it at a much higher granularity," Gerald Kleyn, vice president of HPC & AI customer solutions for HPE, told Live Science.
The more floating-point operations that a computer can process every second, the more powerful it is, enabling it to solve more calculations much quicker. Exascale computing is typically used for conducting complex simulations, such as meteorological weather forecasting, modelling new types of medicine and virtual testing of engine designs.
How many exascale computers are there, and what are they used for?
The first exascale computer, called Frontier, was launched by HPE in June 2022. It has a recorded operating speed of 1.102 exaflops. That speed has since been surpassed by the current leader El Capitan, which currently runs at 1.742 exaflops. There are currently two at the time of publication.
Exascale supercomputers were used during the COVID-19 pandemic to collect, process and analyze massive amounts of data. This enabled scientists to understand and model the virus’s genetic coding, while epidemiologists deployed the machines’ computing power to predict the disease’s spread across the population. These simulations were performed in a much shorter space of time than would have been possible using a high-performance office computer.
Sign up for the Live Science daily newsletter now
Get the world’s most fascinating discoveries delivered straight to your inbox.
It is also worth noting that quantum computers are not the same as supercomputers. Instead of representing information using conventional bits, quantum computers tap into the quantum properties of qubits to solve problems too complex for any classical computer.
In order to work, exascale computing needs tens of thousands of advanced central processing units (CPUs) and graphical processing units (GPUs) to be packed into a space. The close proximity of the CPUs and GPUs is essential, as this reduces latency (the time it takes for data to be transmitted between components) within the system. While latency is typically measured in picoseconds, when billions of calculations are being simultaneously processed, these tiny delays can combine to slow the overall system.
"The interconnect (network) ties the compute nodes (consisting of CPUs and GPUs and memory) together," Pekka Manninen, the director of science and technology at CSC, told Live Science. "The software stack then enables harnessing the joint compute power of the nodes into a single computing task."
Despite their components being crammed in as tightly as possible, exascale computers are still colossal devices. The Frontier supercomputer, for instance, has 74 cabinets, each weighing approximately 3.5 tonnes, and takes over 7,300 square feet (680 square meters) – approximately half the size of a football field.
Why exascale computing is so challenging
Of course, packing so many components tightly together can cause problems. Computers typically require cooling to dissipate the waste heat, and the billions of calculations run by exascale computers every second can heat them up to potentially damaging temperatures.
"Bringing that many components together to operate as one thing is probably the most difficult path, because everything needs to function perfectly," Kleyn said. "As humans, we all know it's difficult enough just to get your family together for dinner, let alone getting 36,000 GPUs working together in synchronicity."
This means that heat management is vital in developing exascale supercomputers. Some use cold environments, such in the Arctic, to maintain ideal temperatures; while others use liquid water-cooling, racks of fans, or some combination of the two to keep temperatures low.
However, environmental control systems also add a further complication to the energy management challenge. Exascale computing requires massive amounts of energy due to the number of processors that need to be powered.
Although exascale computing consumes a lot of energy, it can provide energy savings to a project in the long run. For example, instead of iteratively developing, building and testing new designs, the computers can be used to virtually simulate a design in a comparatively short space of time.
Exascale computers are so highly prone to failure
Another issue facing exascale computing is reliability. The more components there are in a system, the more complex it becomes. The average home computer is expected to have some sort of failure within three years, but in exascale computing, the failure rate is measured in hours.
This short failure rate is due to exascale computing requiring tens of thousands of CPUs and GPUs — all of which operate at high capacity. Given the high demands simultaneously expected of all components, it becomes probable that at least one component will fail within hours.
Due to the failure rate of exascale computing, applications use checkpointing to save progress when processing a calculation, in case of system failure.
In order to mitigate the risk of failure and avoid unnecessary downtime, exascale computers use a diagnostic suite alongside monitoring systems. These systems provide continual oversight of the overall reliability of the system and identify components that are displaying signs of wear, flagging them for replacement before they cause outages.
"A diagnostic suite and a monitoring system shows us how the machine is working. We can drill into each individual component to see where it’s failing and have proactive alerts. Technicians are also constantly working on the machine, to replace failed components and keep it in an operational state," Kleyn said. "It takes a lot of tender loving care to keep these machines going."
The high operating speeds in exascale computing require specialist operating systems and applications in order to take full advantage of their processing power.
"We need to be able to parallelize the computational algorithm over millions of processing units, in a heterogeneous fashion (over nodes and within a node over the GPU or CPU cores)," Manninen. "Not all computing problems lend themselves to it. The communication between the different processes and threads needs to be orchestrated carefully; getting input and output implemented efficiently is challenging."
Because of the complexity of the simulations being performed, verification of results can also be challenging. Exascale computer results cannot be checked, or at least not in a short space of time, by conventional office computers. Instead, applications use predicted error bars, which project a rough estimate of what the expected results should be, with anything outside of these bars discounted.
Beyond exascale computing
According to Moore’s Law, it is expected that the number of transistors in an integrated circuit will double every two years. If this rate of development continues (and it’s a big if, as it cannot go on forever), we could expect zettascale — a one with 21 zeros after it — computing in approximately 10 years.
Exascale computing excels at simultaneously processing massive numbers of calculations in a very short space of time, while quantum computing is beginning to solve incredibly complex problems that conventional computing would struggle with. Although quantum computers are currently not as powerful as exascale computers, it is predicted that they will eventually outpace them.
One possible development could be an amalgamation of quantum computing and supercomputers. This hybrid quantum/classical supercomputer would combine the computing power of quantum computers with the high-speed processing of classical computing. Scientists have started this process already, adding a quantum computer to the Fugaku supercomputer in Japan.
"As we continue to shrink these things down and improve our cooling capabilities and make them less expensive, it's going to be an opportunity to solve problems that we couldn’t solve before," Kleyn said.
Peter is a degree-qualified engineer and experienced freelance journalist, specializing in science, technology and culture. He writes for a variety of publications, including the BBC, Computer Weekly, IT Pro, the Guardian and the Independent. He has worked as a technology journalist for over ten years. Peter has a degree in computer-aided engineering from Sheffield Hallam University. He has worked in both the engineering and architecture sectors, with various companies, including Rolls-Royce and Arup.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.