Utilizing NVIDIA Hopper architecture, the NVIDIA H100 contains a new transformer engine that enables the H100 to deliver up to 9x faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100. It's no wonder businesses are looking closely at how to migrate their workloads from A100 to H100.
As a cloud service provider, we've had early experience working with the new HGX H100 server platforms and have identified some key steps you'll want to take when starting to transition your workloads from the A100 to the new H100. We hope that these points will help make your transition easier. If they do, please let us know in the comments below. Ok, let's get started:
Update Your Driver
First, update your NVIDIA driver to be compatible with CUDA 12. Since our A100 systems all have NVSwitch, we prefer to update the nvidia-fabricmanager to version 525+ by following this guide. Doing so automatically updates the driver as well.
Get Yourself CUDA 12
Next, download CUDA 12, or if you use containers, download an NGC CUDA 12 container such as nvcr.io/nvidia/cuda:12.0.1-devel-ubuntu20.04 which can be found here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags. You will need CUDA 12 to recompile any custom GPU operators or CUDA code for CUDA 12 so your code will run on H100. And so you know, the arch flag for H100 is sm_90.
Prep Those 3rd Party Libraries and Frameworks
Update your 3rd party libraries and deep learning frameworks to a version that supports CUDA 12. Be wary that CUDA 12 is still fairly new and your 3rd party library or framework may not have a CUDA 12 update.
For example, Pytorch at this time does not support CUDA 12. However, NVIDIA has a version of PyTorch working with CUDA 12 in their NGC container repository (nvcr.io/nvidia/pytorch:23.02-py3), which can be found here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags. NVIDIA's NGC catalog provides access to GPU-accelerated software that speeds up end-to-end workflows with performance-optimized containers, pre-trained AI models, and industry-specific SDKs that can be deployed anywhere. The best thing is that NVIDIA's container repository may just have a container with your favorite library or framework working with CUDA 12. Just be sure to check the release notes for the NGC container to verify CUDA 12 support has been added.
Do Some Research
You're going to want to do a little research on H100 and the new 8-bit floating-point operator unique to it. NVIDIA has a brand new library called the Transformer Engine which leverages this new feature on H100 to accelerate Transformer models for both training and inference. Adding this new feature to your code can drastically improve training and inference performance as shown here: https://blogs.nvidia.com/blog/2022/03/22/h100-transformer-engine/
Get To Testing
The last thing to do is some actual testing on an NVIDIA H100 Tensor Core GPU. Cirrascale offers fully managed, NVIDIA GPU-based clusters competitively priced below traditional cloud service providers. These bare-metal, non-virtualized servers are completely dedicated to you with no contention or performance issues typically experienced with virtualization overhead.
Our flat-rate, no-surprises billing model means we can provide you with a price that is up to 30% lower than the other cloud service providers depending on the instance type. We also don't nickel-and-dime you by charging to get your data into or out of our cloud. Instead, we charge no ingress or egress fees so you never receive a supplemental bill.
Hopefully, we've helped you think about some of the first steps you can take to help make your transition easier moving from A100 to H100.