The high-performance computing (HPC) sector sits at the frontier of computer science. HPC systems are clusters of compute nodes aggregated to behave as a single machine and have evolved over the years to deliver extraordinary amounts of computing power. They have been applied to solving some of the planet’s most fascinating and complex problems in space exploration, oil extraction, genomics, engineering, finance and weather prediction.

Historically, taking advantage of high-performance computing has meant owning your own infrastructure. Cloud-based solutions have been viewed as too expensive, too insecure and too slow for serious HPC. The latest estimates are that, of the $35 billion spent on HPC per year (according to Intersect360 Research/Supercomputing conference 2018), only $1.1 billion was spent on cloud-based compute.

However, although it’s a small amount relatively speaking, that $1.1 billion cloud spend represents a 44 percent increase over the prior year — a trend unlikely to stop anytime soon. Several cloud players now offer HPC infrastructure at scale that can address HPC challenges related to security, access and performance. AWS and Univa were able to demonstrate 1.1 million compute cores controlled by a single scheduler in autumn 2018.

Users need to combine the power of traditional on-premises solutions with the flexibility of the cloud, and this is the challenge facing those that have to deliver robust and secure HPC. Although not a panacea for all use cases, the hybrid cloud approach to HPC is becoming an essential piece of a cost-effective HPC program.

Expanded capabilities for aerospace and defense

HPC is an essential tool in modern engineering. Aerospace and defense, automotive and maritime manufacturers, in particular, rely on computerized fluid dynamics simulations to analyze how product components interact with liquids and gases that pass through them. The subsequent turbulence can have a massive impact on the performance of airplanes, cars and ships. Similarly, finite element analysis simulates physical interactions, such as how a car will behave in a crash or the effect of a bird strike on a jet engine.

By simulating how a component will perform before it is produced, modifications can be made and examined quickly and cheaply without expensive prototyping. Such insights help get products to market faster with fewer defects, resulting in major cost savings and improvements in quality.

However, these simulations often require major computing hardware capacity, sometimes requiring tens of thousands of cores for the analysis, driving up development costs. At the 2018 Supercomputing Conference, Pratt and Whitney Canada presented on how it used a combination of on-premises and AWS cloud HPC resources to address the life cycle of its aircraft engine, from product design to manufacturing to aftermarket services.

This approach involved the co-development of hybrid tools for HPC by DXC Technology and AWS. While DXC teamed with AWS and Advania at Pratt & Whitney, HPC works best with a vendor-neutral approach so multiple cloud suppliers can be used to create a genuine multi-cloud capability and leverage the optimal infrastructure for that specific job at that point in time.

The best approach to running hybrid cloud HPC effectively is to combine traditional on-premises capabilities with specialist and cloud capabilities that are controlled and supplied through a single service. This removes the need for the user to have to adapt its HPC job to run with a specific infrastructure and allows a decision tool to route a job to the most appropriate infrastructure without further modification.

This is an exciting time for IT. Cloud is coming of age and bringing new flexibility to HPC programs. While it is clear that on-premises capability will continue to be a part of major HPC operations for the foreseeable future, the benefits of supplementing this capacity with cloud-based and consumption models are equally clear. The hybrid approach removes scale capacity limitations, makes it possible to cope with variable loads, and ensures that the right infrastructure is used in every use case.