Author: Corin Kockenower, Senior Software Engineer
As we push high performance computing into exascale (exaFLOPS – a billion billion operations a second), the great depression aphorism “use it up, wear it out, make it do, or do without” doesn’t seem to be rolling off the tongue of many technologists. Why?
I think I understand why following a brief conversation with Adaptive Computing’s founder and CTO David Jackson. David shared with me a few things he learned during a recent panel discussion.
With nearly 1000-fold performance and efficiency improvements in just the last decade, HPC clusters commissioned less than five years ago are being replaced to make room for the next generation cluster. While I understand the need for, and potential of, an exascale cluster, it seems terribly wasteful to decommission a five-year-old cluster that could be upgraded and tuned to meet the growing demands of an organization. Even if we assume the nature of workflows remain the same, a 50-fold increase in resource demand isn’t unheard of these days. Include Big Data workflows in the equation and the scale likely tips in favor of a new cluster.
When HPC clusters are commissioned, they are designed/balanced between compute, storage, and network resources. Upgrading or optimizing any one of them instead of all of them together will likely have a net zero improvement in overall cluster performance.
Hardware cost is approximately 15 – 20% of the total cost of ownership for an HPC cluster. As performance per watt improves significantly, like it has in the last 3 to 5 years, it becomes compelling and economical to replace an HPC cluster.
The largest personal purchases you and I will likely make are on education, cars, and homes. My mistake was comparing car or home ownership to owning and operating HPC cluster. Most people are painfully aware of the total cost of ownership of cars and homes. Sometimes we just want to junk a car or home and start over. In many ways it is easier and exciting to start from scratch. However, it is generally more cost efficient to buy a car or home, pay it off, and run it into the ground than it is to buy and sell every few years. The problem is that an HPC cluster doesn’t fit the same mold as a home or a car.
In my last blog, I mentioned the Onboard Diagnostic II (OBDII) protocol used in most modern vehicles. A cool new kickstarter, VOYO, uses the OBDII port and optional bluetooth relays to add security and performance/efficiency features to older model vehicles. In my opinion, it makes sense to invest in technology like VOYO to improve performance or economy, and safety of your aging vehicles. However, there doesn’t seem to be a VOYO for an HPC cluster.
Are there any VOYO-like innovators in the HPC space that make it compelling to upgrade compute, storage, and network instead of replacing a cluster? If you know of any, leave a comment and share what you have found.