Unlock the Power of Mistral AI with Red Hat OpenShift AI and NVIDIA DGX H100
I will guide you through the process of deploying Red Hat OpenShift AI on the NVIDIA DGX H100 system and run the Mistral AI model. This blog post details the process of deploying and managing a fully automated MLOps solution for a large language model (LLM) presented in three main parts:
- Deploying OpenShift
- Installing OpenShift AI
- Running the Mistral AI model with the Hugging Face Text Generation Inference toolkit
The NVIDIA DGX H100 marks a pivotal moment in the evolution of artificial intelligence (AI) infrastructure. Powered by the revolutionary NVIDIA H100 Tensor Core GPU, this powerhouse delivers an unprecedented 32 petaFLOPS of AI performance, surpassing previous generations by a staggering 9x. NVIDIA H100 GPUs with TensorRT-LLM allow you to convert model weights into a new FP8 format easily and compile models to take advantage of optimized FP8 kernels automatically. NVIDIA Hopper Transformer Engine technology makes this performance gain possible without changing any model code. This remarkable leap in computing capability is complemented by a suite of cutting-edge advancements, including Dual Intel Xeon CPU with a total of 112 cores, 2TB of RAM, 8 x H100 GPU NVSwitch’ed for a total of 640GB vRAM, PCIe Gen5 interconnects, NVIDIA ConnectX-7 network interface cards, and Mellanox HDR InfiniBand for ultra-fast AI training and inferencing.
The NVIDIA DGX H100 is more than just a powerful machine; it’s an accelerator for innovation, enabling researchers, scientists, and developers to push the boundaries of AI and unlock once unimaginable solutions. As we embark on this new era of AI-powered innovation, the NVIDIA DGX H100 stands at the forefront, empowering us to address the world’s most pressing challenges and shape a brighter future for all.
First look at the DGX H100 BMC
The NVIDIA DGX H100 BMC, or Baseboard Management Controller, is a hardware component that provides out-of-band management capabilities for the DGX H100 system. This means that you can access and control the system even if it is turned off or the operating system is not booted. The BMC can be used to perform a variety of tasks, such as:
- Monitoring system health and status
- Configuring system settings
- Remotely powering on, off, or restarting the system
- Accessing the system’s console and troubleshooting issues
- Performing firmware updates and maintenance tasks
You need to connect the BMC (out-of-band system management) 1 GbE RJ45 interface and allocate one IP address:
To use the BMC, we will use a web browser with the allocated IP address. Once connected, you will be able to view and manage the system.
First, we log into the BMC:
We can see the health status of the server, the uptime and access to all the administration tools:
We are pleased to see the 8 x NVIDIA H100 GPUs, each GPU in the GPU Information
menu:
We can find the proper Vendor ID (VID) and Device ID (DID):
- GPU PCI VID: 0x10de
- GPU PCI DID: 0x2330
Prepare the OpenShift discovery iso file
The OpenShift Assisted Installer streamlines the deployment and management of OpenShift clusters with its user-friendly web interface, automated tasks, and centralized management console. It facilitates the installation of OpenShift clusters on bare metal or cloud platforms, ensuring consistency and reducing errors. Its integration with Red Hat OpenShift Cloud further enhances the management experience.
The NVIDIA DGX H100 server has been certified for Red Hat OpenShift in October 2023.
The OpenShift AI Self-Managed minimum requirements are two worker nodes. We will only deploy one Single node OpenShift for this simple test lab.
We can connect to our Red Hat Hybrid Cloud Console here: https://cloud.redhat.com
You have to Log in to your Red Hat account.
Go to the console and Clusters
menu:
https://console.redhat.com/openshift/
Click on Create cluster
.
Click on the Datacenter
tab.
Click on Bare Metal (x86_64)
.
Pick Interactive
Choose your Cluster name
and Base domain
.
You just need to check Install single node OpenShift (SNO)
.
We pick the release OpenShift 4.14.2
.
Click on Next
.
Click on Next
.
We can add one host by clicking on Add host
.
Put your SSH public key. Click on “Generate Discovery ISO” and “Download Discovery ISO”.
A discovery image is a small Linux operating system image that is used to gather information about the nodes that will be part of an OpenShift cluster. The Assisted Installer uses the discovery image to collect data about the nodes, such as their CPU, memory, storage, and network configuration. This information is then used to assess the nodes’ compatibility with OpenShift and to generate the installation configuration file.
You have one Discovery ISO file of 106MB:
egallen@laptop ~ % ls -lah ~/Downloads/dc57ceea-8ef4-41a4-AAAAA-AAAAAAAAAA-discovery.iso
-rw-r--r--@ 1 egallen staff 106M Dec 2 19:01 /Users/egallen/Downloads/dc57ceea-8ef4-41a4-AAAAA-AAAAAAAAAA-discovery.iso
Boot with the OpenShift iso file
When a node is booted from the discovery image, it will first connect to the Assisted Installer. The Assisted Installer will then send a set of instructions to the node, which the node will execute. These instructions will cause the node to perform a series of hardware and network tests. The results of these tests will be sent back to the Assisted Installer, which will use them to generate the installation configuration file.
We will boot the DGX H100 with one Discovery ISO image.
The server is currently in power off
state in the Power Control
menu:
To see the terminal, we can click on Remote Control
, and the button Launch H5Viewer
Click on CD Image Browse File
, pick your downloaded iso file, click Media Boost
and the button Start Media
We can power on the server.
Press F11
We can see the menu to select the boot device.
Pick UEFI: AMI Virtual CDROM0 1.00
Pick RHEL CoreOS (Live)
from the GRUB menu.
We can see in the BMC KVM menu that my browser has pushed 123MB, and the RHEL CoreOS from the discovery image is booting.
We can see the discovery image prompt:
The hardware discovery is in progress.
After one minute, we can see the host inventory in the console.
Click on Next
.
We install the system on one NVME only (we will use the disk for LVM with OpenShift later).
Click on Next
.
Click on Next
.
Click on Install cluster
The node is rebooting.
Connect to the OpenShift Console
OpenShift is installed and you can find the administrator credentials in the Red Hat Hybrid Cloud Console.
We can take the:
- OpenShift console URL
- kubeadmin password
- kubeconfig file
If you have no domain name server configured, you can also copy/paste the /etc/hosts content by clicking on “not able to access the Web Console?”
We can login.
The OpenShift cluster is up and the status is presented in the Console home page.
We can see the node with Control and worker roles, 224 cores, and 2TB of RAM.
Setup your OpenShift Command Line Interface
Download the oc
command, click on ?
in the top right corner, and Command Line Tools
, pick the archive of your system and architecture.
I’m taking Mac for ARM 64
.
I can unzip this archive.
egallen@laptop ~ % sudo unzip ~/Downloads/oc.zip -d /usr/local/bin
Archive: /Users/egallen/Downloads/oc.zip
extracting: /usr/local/bin/oc
You have to authorize this binary, launch it one time:
egallen@laptop ~ % /usr/local/bin/oc
zsh: killed /usr/local/bin/oc
( For Mac users, allow the binary in the System Settings
of your macOS, Privacy & Security
and click on Allow Anyway
)
In the right top corner of the OpenShift Console, we can click on: ?
, Copy login command
, and Display Token
, we can copy the token.
We can run the oc login
command:
egallen@laptop ~ % oc login --token=sha256~XXXXXXXXXXXXXXXXXXXXXXXXXXX --server=https://api.dgxh100.redhat.com:6443
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://api.dgxh100.redhat.com:6443" as "kube:admin" using the token provided.
You have access to 69 projects, the list has been suppressed. You can list all projects with 'oc projects'
Using project "default".
The client has a 4.14 version:
egallen@laptop ~ % oc version
Client Version: 4.14.0-202311021650.p0.g9b1e0d2.assembly.stream-9b1e0d2
Kustomize Version: v5.0.1
Server Version: 4.14.2
Kubernetes Version: v1.27.6+f67aeb3
We can use our OpenShift 4.14.2
egallen@laptop ~ % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.2 True False 10m Cluster version is 4.14.2
Update the OpenShift cluster
[ Perhaps you don’t have to do any update on your cluster, this step is not mandatory. ]
We will update the cluster to the latest OpenShift 4.14 release.
We will update from OpenShift 4.14.2
to OpenShift 4.14.3
Click on Select a version
Click on Update
The update is in progress:
egallen@laptop ~ % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.2 True True 17m Working towards 4.14.3: 635 of 860 done (73% complete)
After an automatic reboot, we have the latest OpenShift release: 4.14 available as of today:
egallen@laptop ~ % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.3 True False 15s Cluster version is 4.14.3
Our Single Node OpenShift is up to date \o/.
Check the presence of the H100 GPUs
Because we have provided our public key to the OpenShift Installer, we can ssh to the node (connecting directly with ssh is not a good practice for the OpenShift productions administration):
egallen@laptop ~ % ssh core@dgxh100.redhat.com
Red Hat Enterprise Linux CoreOS 414.92.202311150705-0
Part of OpenShift 4.14, RHCOS is a Kubernetes native operating system
managed by the Machine Config Operator (`clusteroperator/machine-config`).
WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
https://docs.openshift.com/container-platform/4.14/architecture/architecture-rhcos.html
We can make one lspci
command and list the GPUs:
[core@dgxh100 ~]$ lspci | grep H100
1b:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
43:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
52:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
61:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
9d:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
c3:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
d1:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
df:00.0 3D controller: NVIDIA Corporation GH100[H100 SXM5 80GB] (rev a1)
Install the Node Feature Discovery operator
We can now install the NVIDIA GPU drivers, and NVIDIA device plugin.
The Node Feature Discovery Operator
(NFD) is a Kubernetes operator that automates the process of discovering and labeling node features in a Kubernetes cluster. This operator runs on each node in the cluster and scans its hardware and software configuration to identify the available features. It then labels the node with these features, which can be used by other operators and applications to determine which nodes are suitable for running specific workloads. This can help to improve the efficiency of resource allocation and improve the overall performance of the cluster.
NFD will label the node with the NVIDIA GPU PCIe vendor ID label (we have only one node for now, but this operator is required in the installation).
You should take the Node Feature Discovery
tagged Red Hat
(not the one tagged Community
)
Click on Install
.
Click on Install
.
Click on Install
.
Click on View Operator
.
You can now create one instance, click on Create instance
for the NodeFeatureDiscovery
.
NFD has labeled the node, we can check the nfd labels with the command:
egallen@laptop ~ % oc describe node dgxh100.redhat.com | grep feature.node.kubernetes.io
...
We can just check if we find the NVIDIA PCI vendor ID label:
egallen@laptop ~ % oc describe node dgxh100.redhat.com | grep feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-10de.present=true
Install the NVIDIA GPU Operator
The NVIDIA GPU Operator manages NVIDIA GPUs on Kubernetes clusters. It automates the provisioning, configuring, and monitoring of NVIDIA GPUs, ensuring that GPUs are available and correctly configured for application use. The NVIDIA GPU Operator also supports NVIDIA’s GPUDirect technologies, which enable applications to communicate directly with GPUs without relying on the CPU, improving application performance. The NVIDIA GPU Operator is a valuable tool for any Kubernetes cluster that uses NVIDIA GPUs. It can help to simplify the management of GPUs, improve application performance, and reduce the overall complexity of operating a Kubernetes cluster with GPUs.
For now, we have PCIe node labels but no NVIDIA drivers, no NVIDIA Device Plugin, and no GPU monitoring. The NVIDIA GPU labels are not exposed to the Kubernetes scheduler:
Before the NVIDIA GPU Operator installation, the NVIDIA GPU labels are not available for the Kubernetes scheduler:
egallen@laptop ~ % oc describe node | grep nvidia.com/gpu
egallen@laptop ~ %
We can now install the NVIDIA GPU Operator.
Search NVIDIA GPU Operator
in Operators
> OperatorHub
.
Click on NVIDIA GPU Operator
.
Click on Install
.
Click on Install
.
Click on View Operator
.
Click on Create Instance
in the ClusterPolicy
.
You can keep the default values.
Click on Create
.
You can check the NVIDIA GPU operator installation progress by listing the pods running in the project nvidia-gpu-operator
:
egallen@laptop ~ % oc get pods -n nvidia-gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-lqqzt 0/1 Init:0/1 0 85s
gpu-operator-68f86df569-5r5hs 1/1 Running 0 6m32s
nvidia-container-toolkit-daemonset-p8lnz 0/1 Init:0/1 0 85s
nvidia-dcgm-exporter-98l6p 0/1 Init:0/2 0 85s
nvidia-dcgm-nxnrs 0/1 Init:0/1 0 85s
nvidia-device-plugin-daemonset-6dm6m 0/1 Init:0/1 0 85s
nvidia-driver-daemonset-414.92.202311150705-0-fkjpm 1/2 Running 0 2m6s
nvidia-node-status-exporter-6njsm 1/1 Running 0 2m6s
nvidia-operator-validator-jd6ff 0/1 Init:0/4 0 85s
The installation is completed:
egallen@laptop ~ % oc get pods -n nvidia-gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-lqqzt 1/1 Running 0 3m48s
gpu-operator-68f86df569-5r5hs 1/1 Running 0 8m55s
nvidia-container-toolkit-daemonset-p8lnz 1/1 Running 0 3m48s
nvidia-cuda-validator-ms9nc 0/1 Completed 0 48s
nvidia-dcgm-exporter-98l6p 1/1 Running 0 3m48s
nvidia-dcgm-nxnrs 1/1 Running 0 3m48s
nvidia-device-plugin-daemonset-6dm6m 1/1 Running 0 3m48s
nvidia-driver-daemonset-414.92.202311150705-0-fkjpm 2/2 Running 0 4m29s
nvidia-mig-manager-v7vkm 1/1 Running 0 21s
nvidia-node-status-exporter-6njsm 1/1 Running 0 4m29s
nvidia-operator-validator-jd6ff 1/1 Running 0 3m48s
We can see that the latest step with the nvidia-operator-validator
is reporting a successful installation:
egallen@laptop ~ % oc logs nvidia-operator-validator-jd6ff -n nvidia-gpu-operator
Defaulted container "nvidia-operator-validator" out of: nvidia-operator-validator, driver-validation (init), toolkit-validation (init), cuda-validation (init), plugin-validation (init)
all validations are successful
We are good, I’ve got 8 GPU ready to be used:
egallen@laptop ~ % oc describe node | egrep 'Capacity|nvidia.com/gpu:|Allocatable:'
Capacity:
nvidia.com/gpu: 8
Allocatable:
nvidia.com/gpu: 8
Basic UBI base image test
Testing one UBI pod with 1 x H100 GPUs
nvidia-smi
is a command-line tool that provides comprehensive information about NVIDIA GPUs, enabling users to monitor and optimize GPU performance, track usage, maintain optimal temperature and fan speed, and benchmark applications to identify performance bottlenecks.
We will run a simple CUDA UBI pod to check the NVIDIA GPU status with the nvidia-smi
command.
We apply the pod spec:
egallen@laptop ~ % cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: command-nvidia-smi
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/cuda:12.1.0-base-ubi8
command: ["/bin/sh","-c"]
args: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
We can check the logs, we have one H100 allocated as requested:
egallen@laptop ~ % oc get pods
NAME READY STATUS RESTARTS AGE
command-nvidia-smi 0/1 Completed 0 7s
egallen@laptop ~ % oc logs command-nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:52:00.0 Off | 0 |
| N/A 31C P0 72W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
egallen@laptop ~ % oc delete pod command-nvidia-smi
pod "command-nvidia-smi" deleted
Testing one UBI pod with 8 x H100 GPUs
We can try to schedule a pod with 8 x H100 GPUs:
egallen@laptop ~ % cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: command-nvidia-smi
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/cuda:12.1.0-base-ubi8
command: ["/bin/sh","-c"]
args: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 8 # requesting 8 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
We can check the logs, we have 8 x H100 allocated as requested:
egallen@laptop ~ % oc get pods
NAME READY STATUS RESTARTS AGE
command-nvidia-smi 0/1 Completed 0 34s
egallen@laptop ~ % oc logs command-nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:1B:00.0 Off | 0 |
| N/A 25C P0 72W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:43:00.0 Off | 0 |
| N/A 27C P0 71W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:52:00.0 Off | 0 |
| N/A 31C P0 72W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:61:00.0 Off | 0 |
| N/A 29C P0 71W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:9D:00.0 Off | 0 |
| N/A 26C P0 71W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:C3:00.0 Off | 0 |
| N/A 25C P0 70W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:D1:00.0 Off | 0 |
| N/A 29C P0 73W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:DF:00.0 Off | 0 |
| N/A 31C P0 72W / 700W | 2MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
egallen@laptop ~ % oc delete pod command-nvidia-smi
pod "command-nvidia-smi" deleted
Setup Persistent Volume with the LVM Operator
Red Hat OpenShift Data Foundation is recommended as a software-defined storage for an entire cluster. Because We are running a Single Node OpenShift, we will use the OpenShift LVM Operator. The OpenShift LVM Operator is a tool that automates the creation, management, and extension of Logical Volume Manager (LVM) volumes on OpenShift clusters. It enables users to provision and manage storage resources for their applications efficiently. The Operator simplifies storage provisioning by creating and managing LVM volumes using custom resource definitions (CRDs). This eliminates the need for manual configuration and reduces the risk of errors. Additionally, the Operator provides a centralized view of all LVM volumes in the cluster, making it easy to monitor and troubleshoot storage issues.
Red Hat OpenShift AI will require one Persistent Volume.
We have no PV available, for now:
egallen@laptop ~ % oc get pv
No resources found
We can check the disks available on the server, we are only using /dev/nvme0n1
:
[core@dgxh100 ~]$ sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 1.7T 0 disk
├─nvme0n1p1 259:1 0 1M 0 part
├─nvme0n1p2 259:2 0 127M 0 part
├─nvme0n1p3 259:3 0 384M 0 part /boot
└─nvme0n1p4 259:4 0 1.7T 0 part /var/lib/kubelet/pods/190e9f63-c1ac-458f-8708-b2f7f110da38/volume-subpaths/nvidia-mig-manager-entrypoint/nvidia-mig-manager/0
/var/lib/kubelet/pods/d1a57079-19e1-4d4b-8df3-ea16c16dae8c/volume-subpaths/nvidia-device-plugin-entrypoint/nvidia-device-plugin/0
/var/lib/kubelet/pods/311e95ae-d509-47b9-8a10-e4804533cd04/volume-subpaths/init-config/init-pod-nvidia-node-status-exporter/1
/var/lib/kubelet/pods/a1ebc821-3891-4a2d-82ae-549076f3fe12/volume-subpaths/nvidia-container-toolkit-entrypoint/nvidia-container-toolkit-ctr/0
/run/nvidia/driver/etc/hosts
/run/nvidia/driver/mnt/shared-nvidia-driver-toolkit
/run/nvidia/driver/host-etc/os-release
/run/nvidia/driver/var/log
/run/nvidia/driver/dev/termination-log
/var/lib/kubelet/pods/c065cb71-7f77-444b-a9d7-0e0cf2b02a22/volume-subpaths/nginx-conf/monitoring-plugin/1
/var
/sysroot/ostree/deploy/rhcos/var
/usr
/etc
/
/sysroot
nvme1n1 259:5 0 1.7T 0 disk
nvme2n1 259:6 0 3.5T 0 disk
nvme4n1 259:7 0 3.5T 0 disk
nvme5n1 259:8 0 3.5T 0 disk
nvme3n1 259:9 0 3.5T 0 disk
nvme6n1 259:10 0 3.5T 0 disk
nvme7n1 259:11 0 3.5T 0 disk
nvme8n1 259:12 0 3.5T 0 disk
nvme9n1 259:13 0 3.5T 0 disk
We will use /dev/nvme2n1
for the LVM Storage operator.
In the openShift Console, we can go to Operators
> OperatoHub
and search for lvm
.
Click on LVM Storage
.
Click on Install
.
The LVM Operator is installed.
Click on Create LVMCluster
.
I’m calling my LVMCluster data-lvmcluster
, and clicking on Create
.
The operator is properly installed:
egallen@laptop ~ % oc get csv -n openshift-storage -o custom-columns=Name:.metadata.name,Phase:.status.phase
Name Phase
lvms-operator.v4.14.1 Succeeded
Go to Operators
> Installed Operators
Click on LVM Storage
.
[core@dgxh100 ~]$ sudo lsblk /dev/nvme3n1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme4n1 259:7 0 3.5T 0 disk
The Volume Group has been created:
[core@dgxh100 ~]$ sudo pvs
PV VG Fmt Attr PSize PFree
/dev/nvme4n1 vg1 lvm2 a-- 3.49t <357.70g
[core@dgxh100 ~]$ sudo lsblk /dev/nvme4n1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme4n1 259:7 0 3.5T 0 disk
├─vg1-thin--pool--1_tmeta 253:0 0 1.6G 0 lvm
│ └─vg1-thin--pool--1 253:2 0 3.1T 0 lvm
└─vg1-thin--pool--1_tdata 253:1 0 3.1T 0 lvm
└─vg1-thin--pool--1 253:2 0 3.1T 0 lvm
The storageclass is available:
egallen@laptop ~ % oc get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
lvms-vg1 (default) topolvm.io Delete WaitForFirstConsumer true 27m
We can see that the volume snapshot class is created:
egallen@laptop ~ % oc get volumesnapshotclass
NAME DRIVER DELETIONPOLICY AGE
lvms-vg1 topolvm.io Delete 28m
We can see the lvmvolumegroup
:
egallen@laptop ~ % oc get lvmvolumegroup -A
NAMESPACE NAME AGE
openshift-storage vg1 30m
The LVMVolumeGroupNodeStatus vg1
resource is created:
egallen@laptop ~ % oc get lvmvolumegroup vg1 -o yaml -n openshift-storage
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMVolumeGroup
metadata:
creationTimestamp: "2023-12-02T22:46:11Z"
finalizers:
- lvm.openshift.io/lvmvolumegroup
generation: 1
name: vg1
namespace: openshift-storage
ownerReferences:
- apiVersion: lvm.topolvm.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: LVMCluster
name: data-lvmcluster
uid: 2ca190ad-5212-424a-be5d-062575d24b1d
resourceVersion: "72769"
uid: 946ff64d-53bc-4116-9307-1cbf6bed946d
spec:
default: true
deviceSelector:
paths:
- /dev/nvme4n1
thinPoolConfig:
name: thin-pool-1
overprovisionRatio: 10
sizePercent: 90
Install Red Hat OpenShift AI
Red Hat OpenShift AI is a flexible, scalable MLOps platform with tools to build, deploy, and manage AI-enabled applications. Built using open-source technologies, it provides trusted, operationally consistent capabilities for teams to experiment, serve models, and deliver innovative apps. OpenShift AI (previously called Red Hat OpenShift Data Science) supports the full lifecycle of AI/ML experiments and models, on-premise and in the public cloud.
We can now install Red Hat OpenShift AI.
We can go to Operators
> OperatorHub
.
We can click on Red Hat OpenShift Data Science
.
Click on Install
.
Click on Install
.
Click on Create DataScienceCluster
Click on Create
The DataScienceCluster
is Ready.
We have a new menu available on the top right to launch Red Hat OpenShift AI
.
Click on Log in with OpenShift
We can configure the storage for the image registry in non-production clusters
You must configure storage for the Image Registry Operator. For non-production clusters, you can set the image registry to an empty directory. If you do so, all images are lost if you restart the registry.
You could have this error if you don’t make this configuration on SNO with the LVM Operator:
Error: InvalidImageName
Failed to apply default image tag ":2023.2": couldn\'t parse image reference ":2023.2": invalid reference format
Configure these two options for only non-production clusters.
Set the image registry storage to an empty directory:
egallen@laptop ~ % oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'
config.imageregistry.operator.openshift.io/cluster patched
Change managementState Image Registry Operator configuration from Removed to Managed. For example:
egallen@laptop ~ % oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed"}}'
config.imageregistry.operator.openshift.io/cluster patched
PyTorch notebook
The PyTorch notebook is a pre-installed Jupyter notebook environment designed for using PyTorch, a machine learning library, on Red Hat OpenShift AI. It provides a user-friendly interface, ease of use, productivity boost, and tight integration with Red Hat OpenShift AI.
Launch a PyTorch notebook with 8 x NVIDIA H100 GPUs
We can launch a notebook with 8 x NVIDIA H100 GPUs
We click on Launch application
.
We choose 8 accelerators
, the X Large
Container Size, and click on Start server
.
The Jupiter Notebook server is started.
We choose the Notebook Python 3.9
.
If we run the !nvidia-smi
command in the notebook, we can see the 8 x H100 GPUs.
We can test inferencing with the opt125m
model from Facebook.
Launch a PyTorch notebook with 2 x NVIDIA H100 GPUs
If we check before 0 requests are in progress:
egallen@laptop ~ % oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu|Allocatable:|Requests +Limits"
Name: dgxh100.redhat.com
Roles: control-plane,master,worker
nvidia.com/gpu-driver-upgrade-state=upgrade-done
nvidia.com/gpu.compute.major=9
nvidia.com/gpu.compute.minor=0
nvidia.com/gpu.count=8
nvidia.com/gpu.deploy.container-toolkit=true
nvidia.com/gpu.deploy.dcgm=true
nvidia.com/gpu.deploy.dcgm-exporter=true
nvidia.com/gpu.deploy.device-plugin=true
nvidia.com/gpu.deploy.driver=true
nvidia.com/gpu.deploy.gpu-feature-discovery=true
nvidia.com/gpu.deploy.mig-manager=true
nvidia.com/gpu.deploy.node-status-exporter=true
nvidia.com/gpu.deploy.nvsm=
nvidia.com/gpu.deploy.operator-validator=true
nvidia.com/gpu.family=hopper
nvidia.com/gpu.machine=DGXH100
nvidia.com/gpu.memory=81559
nvidia.com/gpu.present=true
nvidia.com/gpu.product=NVIDIA-H100-80GB-HBM3
nvidia.com/gpu.replicas=1
nvidia.com/gpu-driver-upgrade-enabled: true
Capacity:
nvidia.com/gpu: 8
Allocatable:
nvidia.com/gpu: 8
Resource Requests Limits
nvidia.com/gpu 0 0
We launch a notebook with 2 H100 GPUs:
After:
egallen@laptop ~ % oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu|Allocatable:|Requests +Limits"
Name: dgxh100.redhat.com
Roles: control-plane,master,worker
nvidia.com/gpu-driver-upgrade-state=upgrade-done
nvidia.com/gpu.compute.major=9
nvidia.com/gpu.compute.minor=0
nvidia.com/gpu.count=8
nvidia.com/gpu.deploy.container-toolkit=true
nvidia.com/gpu.deploy.dcgm=true
nvidia.com/gpu.deploy.dcgm-exporter=true
nvidia.com/gpu.deploy.device-plugin=true
nvidia.com/gpu.deploy.driver=true
nvidia.com/gpu.deploy.gpu-feature-discovery=true
nvidia.com/gpu.deploy.mig-manager=true
nvidia.com/gpu.deploy.node-status-exporter=true
nvidia.com/gpu.deploy.nvsm=
nvidia.com/gpu.deploy.operator-validator=true
nvidia.com/gpu.family=hopper
nvidia.com/gpu.machine=DGXH100
nvidia.com/gpu.memory=81559
nvidia.com/gpu.present=true
nvidia.com/gpu.product=NVIDIA-H100-80GB-HBM3
nvidia.com/gpu.replicas=1
nvidia.com/gpu-driver-upgrade-enabled: true
Capacity:
nvidia.com/gpu: 8
Allocatable:
nvidia.com/gpu: 8
Resource Requests Limits
nvidia.com/gpu 2 2
Dig into PyTorch device list
We can list in the notebook the devices seen by PyTorch.
[*] for device_id in range(0,8):
print(f'device name [',device_id,']:', torch.cuda.get_device_name(device_id))
device name [ 0 ]: NVIDIA H100 80GB HBM3
device name [ 1 ]: NVIDIA H100 80GB HBM3
device name [ 2 ]: NVIDIA H100 80GB HBM3
device name [ 3 ]: NVIDIA H100 80GB HBM3
device name [ 4 ]: NVIDIA H100 80GB HBM3
device name [ 5 ]: NVIDIA H100 80GB HBM3
device name [ 6 ]: NVIDIA H100 80GB HBM3
device name [ 7 ]: NVIDIA H100 80GB HBM3
Basic PyTorch Benchmark
torchbenchmark/models
contains copies of popular or exemplary workloads which have been modified to expose a standardized API for benchmark drivers. PyTorch Benchmark contains a miniature version of train/test data and a dependency install script.
First load the PyTorch benchmark module:
[*] !pip install pytorch-benchmark
[*] import torch
from torchvision.models import efficientnet_b0
from pytorch_benchmark import benchmark
CNN Efficientnet-b0 model image classification
CPU image classification with the efficientnet-b0 model
We can start with a basic CPU benchmark with the image classification model efficientnet-b0. We will run 1000 inferences with PyTorch, with a batch size of 1 or 8.
The DGX H100 has a total of CPU 112 cores.
[*]: model = efficientnet_b0().to("cpu") # Model device sets benchmarking device
sample = torch.randn(8, 3, 224, 224) # (B, C, H, W)
results = benchmark(model, sample, num_runs=1000)
Results for CPU:
Inference for batch_size | Time to process | Iterations/second |
---|---|---|
1 | 03:38 | 4.57 |
8 | 03:29 | 4.77 |
Time to run: 7 minutes 50 seconds for 1000 runs
GPU image classification with the efficientnet-b0 model
We can continue with a GPU benchmark with the image classification model efficientnet-b0. We will run 1000 inferences with PyTorch, with a batch size of 1 or 8.
[ ]: model = efficientnet_b0().to("cuda") # Model device sets benchmarking device
sample = torch.randn(8, 3, 224, 224) # (B, C, H, W)
results = benchmark(model, sample, num_runs=1000)
Results for GPU (CUDA):
Inference for batch_size | Time to process | Iterations/second |
---|---|---|
1 | 00:03 | 327.63 |
8 | 00:03 | 279.89 |
Time to run: 6 seconds for 1000 runs.
NVIDIA MIG configuration
MIG mixed strategy switch
NVIDIA’s Multi-Instance GPU (MIG) technology allows a single physical GPU to be divided into multiple GPU MIG devices, enabling multiple workloads to run simultaneously on a single GPU, isolating workloads for improved performance and security, and optimizing resource utilization to maximize GPU performance and efficiency. It is a valuable tool for organizations seeking to optimize their GPU resources and reduce costs.
We will start by enabling a MIG mixed strategy.
For each MIG configuration, you have to pick a Strategy type
and a MIG configuration label
.
We will test one mixed
strategy with the label all-balanced
on one NVIDIA DGX H100 server with 8 x H100 80GB GPUs.
For this test we are using one Single Node OpenShift on one DGX H100 server.
By default, MIG is disabled, with the single strategy:
egallen@laptop ~ % oc describe node | grep nvidia.com/mig
nvidia.com/mig.capable=true
nvidia.com/mig.config=all-disabled
nvidia.com/mig.config.state=success
nvidia.com/mig.strategy=single
mig-manager is running to reconfigure the MIG configuration of the hardware:
egallen@laptop ~ % oc -n nvidia-gpu-operator get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
gpu-feature-discovery 1 1 1 1 1 nvidia.com/gpu.deploy.gpu-feature-discovery=true 41h
nvidia-container-toolkit-daemonset 1 1 1 1 1 nvidia.com/gpu.deploy.container-toolkit=true 41h
nvidia-dcgm 1 1 1 1 1 nvidia.com/gpu.deploy.dcgm=true 41h
nvidia-dcgm-exporter 1 1 1 1 1 nvidia.com/gpu.deploy.dcgm-exporter=true 41h
nvidia-device-plugin-daemonset 1 1 1 1 1 nvidia.com/gpu.deploy.device-plugin=true 41h
nvidia-driver-daemonset-414.92.202311150705-0 1 1 1 1 1 feature.node.kubernetes.io/system-os_release.OSTREE_VERSION=414.92.202311150705-0,nvidia.com/gpu.deploy.driver=true 41h
nvidia-mig-manager 1 1 1 1 1 nvidia.com/gpu.deploy.mig-manager=true 41h
nvidia-node-status-exporter 1 1 1 1 1 nvidia.com/gpu.deploy.node-status-exporter=true 41h
nvidia-operator-validator 1 1 1 1 1 nvidia.com/gpu.deploy.operator-validator=true 41h
We will apply the mixed strategy with the MIG configuration label all-balanced. Each of the H100 GPUs should enable these MIG profiles:
- 2 x 1g.10gb
- 1 x 2g.20gb
- 1 x 3g.40g
If we have 8 x H100, we will have on the cluster:
- 16 x 1g.10gb (8 x 2)
- 8 x 2g.20gb (8 x 1)
- 8 x 3g.40gb (8 x 1)
We prepare the variables:
egallen@laptop ~ % NODE_NAME=dgxh100.redhat.com
egallen@laptop ~ % STRATEGY=mixed
egallen@laptop ~ % MIG_CONFIGURATION=all-balanced
Apply the strategy:
egallen@laptop ~ % oc patch clusterpolicy/gpu-cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/mig/strategy", "value": '$STRATEGY'}]'
clusterpolicy.nvidia.com/gpu-cluster-policy patched
Label the node with the MIG type:
egallen@laptop ~ % oc label node/$NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
node/dgxh100.redhat.com labeled
Check the logs
egallen@laptop ~ % oc -n nvidia-gpu-operator logs ds/nvidia-mig-manager --all-containers -f --prefix
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:34Z" level=debug msg="Running pre-apply-config hook"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:34Z" level=debug msg="Applying MIG device configuration..."
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:35Z" level=debug msg="Walking MigConfig for (device-filter=[0x232110DE 0x233A10DE], devices=all)"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:35Z" level=debug msg="Walking MigConfig for (device-filter=[0x233010DE 0x233110DE 0x232210DE 0x20B210DE 0x20B510DE 0x20F310DE 0x20F510DE], devices=all)"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:35Z" level=debug msg=" GPU 0: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:35Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:35Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:36Z" level=debug msg=" GPU 1: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:36Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:36Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:37Z" level=debug msg=" GPU 2: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:37Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:37Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:37Z" level=debug msg=" GPU 3: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:37Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:37Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:38Z" level=debug msg=" GPU 4: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:38Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:38Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:39Z" level=debug msg=" GPU 5: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:39Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:39Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:39Z" level=debug msg=" GPU 6: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:39Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:39Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:40Z" level=debug msg=" GPU 7: 0x233010DE"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:40Z" level=debug msg=" MIG capable: true\n"
[pod/nvidia-mig-manager-v7vkm/nvidia-mig-manager] time="2023-12-04T15:46:40Z" level=debug msg=" Updating MIG config: map[1g.10gb:2 2g.20gb:1 3g.40gb:1]"
Check the status:
egallen@laptop ~ % oc describe node | grep nvidia.com/mig.config
nvidia.com/mig.config=all-balanced
nvidia.com/mig.config.state=success
We can check the nvidia-smi output:
% cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: command-nvidia-smi
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/cuda:12.1.0-base-ubi8
command: ["/bin/sh","-c"]
args: ["nvidia-smi"]
EOF
We can see the 32 MIG devices instead of 8 GPUs:
egallen@laptop ~ % oc logs command-nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:1B:00.0 Off | On |
| N/A 25C P0 71W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:43:00.0 Off | On |
| N/A 26C P0 70W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:52:00.0 Off | On |
| N/A 31C P0 72W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:61:00.0 Off | On |
| N/A 29C P0 71W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:9D:00.0 Off | On |
| N/A 26C P0 71W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:C3:00.0 Off | On |
| N/A 25C P0 70W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:D1:00.0 Off | On |
| N/A 29C P0 73W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:DF:00.0 Off | On |
| N/A 31C P0 72W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+--------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+================================+===========+=======================|
| 0 2 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 3 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 9 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 10 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 1 2 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 1 3 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 1 9 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 1 10 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 2 2 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 2 3 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 2 9 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 2 10 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 3 2 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 3 3 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 3 9 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 3 10 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 4 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 4 5 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 4 13 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 4 14 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 5 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 5 5 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 5 13 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 5 14 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 6 2 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 6 3 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 6 9 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 6 10 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 7 2 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 7 3 0 1 | 11MiB / 20096MiB | 32 0 | 2 0 2 0 2 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 7 9 0 2 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 7 10 0 3 | 5MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
% oc delete pod command-nvidia-smi
pod "command-nvidia-smi" deleted
We can see 32 MIG devices as expected: 16 x 1g.10gb + 8 x 1g.10gb + 8 x 3g.40gb.
egallen@laptop ~ % oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu:|nvidia.com/mig-.* |Allocatable:|Requests +Limits"
Name: dgxh100.redhat.com
Roles: control-plane,master,worker
Capacity:
nvidia.com/gpu: 0
nvidia.com/mig-1g.10gb: 16
nvidia.com/mig-2g.20gb: 8
nvidia.com/mig-3g.40gb: 8
Allocatable:
nvidia.com/gpu: 0
nvidia.com/mig-1g.10gb: 16
nvidia.com/mig-2g.20gb: 8
nvidia.com/mig-3g.40gb: 8
Resource Requests Limits
nvidia.com/mig-1g.10gb 0 0
nvidia.com/mig-2g.20gb 0 0
nvidia.com/mig-3g.40gb 0 0
In OpenShift AI, an accelerator profile defines the specification of an accelerator. Before you can use an accelerator in OpenShift AI, your OpenShift instance must contain the associated accelerator profile.
This specific configuration is only required in mixed
MIG strategy, because in single
configuration the label nvidia.com/gpu
is still used
We need to change the Accelerator profile from OpenShift AI because we are in mixed
MIG strategy.
Accelerator profile documentation
In the OpenShift Container Platform web console, in the Administrator perspective, click Administration → CustomResourceDefinitions.
In the search bar, enter acceleratorprofile
to search by name.
The CustomResourceDefinitions
page reloads to display the search results.
Click the AcceleratorProfile custom resource definition (CRD).
A details page for the custom resource definition (CRD) opens.
Click the Instances tab. Click Create AcceleratorProfile.
The Create AcceleratorProfile page opens with an embedded YAML editor.
We see the existing AcceleratorProfile:
apiVersion: dashboard.opendatahub.io/v1
kind: AcceleratorProfile
metadata:
creationTimestamp: '2023-12-02T22:01:49Z'
generation: 1
managedFields:
- apiVersion: dashboard.opendatahub.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
.: {}
'f:displayName': {}
'f:enabled': {}
'f:identifier': {}
'f:tolerations': {}
manager: unknown
operation: Update
time: '2023-12-02T22:01:49Z'
name: migrated-gpu
namespace: redhat-ods-applications
resourceVersion: '54196'
uid: 3c34bfc5-f6b6-407b-a9d8-f52d5155843f
spec:
displayName: NVIDIA GPU
enabled: true
identifier: nvidia.com/gpu
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
We replace the identifier key: nvidia.com/gpu
by : key: nvidia.com/mig-1g.10gb
apiVersion: dashboard.opendatahub.io/v1
kind: AcceleratorProfile
metadata:
creationTimestamp: '2023-12-02T22:01:49Z'
generation: 2
managedFields:
- apiVersion: dashboard.opendatahub.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
.: {}
'f:displayName': {}
'f:enabled': {}
manager: unknown
operation: Update
time: '2023-12-02T22:01:49Z'
- apiVersion: dashboard.opendatahub.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
'f:identifier': {}
'f:tolerations': {}
manager: Mozilla
operation: Update
time: '2023-12-04T21:39:51Z'
name: migrated-gpu
namespace: redhat-ods-applications
resourceVersion: '1259992'
uid: 3c34bfc5-f6b6-407b-a9d8-f52d5155843f
spec:
displayName: NVIDIA GPU
enabled: true
identifier: nvidia.com/mig-1g.10gb
tolerations:
- effect: NoSchedule
key: nvidia.com/mig-1g.10gb
operator: Exists
When we schedule a Notebook with 2 GPUs, we can see that two nvidia.com/mig-1g.10gb
resources are used:
egallen@laptop ~ % oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu:|nvidia.com/mig-.* |Allocatable:|Requests +Limits"
Name: dgxh100.redhat.com
Roles: control-plane,master,worker
Capacity:
nvidia.com/gpu: 0
nvidia.com/mig-1g.10gb: 16
nvidia.com/mig-2g.20gb: 8
nvidia.com/mig-3g.40gb: 8
Allocatable:
nvidia.com/gpu: 0
nvidia.com/mig-1g.10gb: 16
nvidia.com/mig-2g.20gb: 8
nvidia.com/mig-3g.40gb: 8
Resource Requests Limits
nvidia.com/mig-1g.10gb 2 2
nvidia.com/mig-2g.20gb 0 0
nvidia.com/mig-3g.40gb 0 0
MIG single strategy switch
We will test one single
strategy with the label all-3g.40gb
on one NVIDIA DGX H100 server with 8 x H100 80GB GPUs.
We are using one Single Node OpenShift for this test on one DGX H100 server.
We are starting from a mixed
strategy with the label all-balanced with 32 devices available:
Check before:
egallen@laptop ~ %data-science-pipelines % oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu:|nvidia.com/mig-.* |Allocatable:|Requests +Limits"
Name: dgxh100.redhat.com
Roles: control-plane,master,worker
Capacity:
nvidia.com/gpu: 0
nvidia.com/mig-1g.10gb: 16
nvidia.com/mig-2g.20gb: 8
nvidia.com/mig-3g.40gb: 8
Allocatable:
nvidia.com/gpu: 0
nvidia.com/mig-1g.10gb: 16
nvidia.com/mig-2g.20gb: 8
nvidia.com/mig-3g.40gb: 8
Resource Requests Limits
nvidia.com/mig-1g.10gb 0 0
nvidia.com/mig-2g.20gb 0 0
nvidia.com/mig-3g.40gb 0 0
Prepare the variables:
egallen@laptop ~ % NODE_NAME=dgxh100.redhat.com
egallen@laptop ~ % STRATEGY=single
egallen@laptop ~ % MIG_CONFIGURATION=all-3g.40gb
We apply the strategy:
egallen@laptop ~ % oc label node/$NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
We label the node with the MIG type:
egallen@laptop ~ % oc label node/$NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
Check the status:
egallen@laptop ~ % oc describe node | grep gpu.count
nvidia.com/gpu.count=0
egallen@laptop ~ % oc describe node | egrep "Name:|Roles:|Capacity|nvidia.com/gpu:|nvidia.com/mig-.* |Allocatable:|Requests +Limits"
Name: dgxh100.redhat.com
Roles: control-plane,master,worker
Capacity:
nvidia.com/gpu: 16
nvidia.com/mig-1g.10gb: 0
nvidia.com/mig-2g.20gb: 0
nvidia.com/mig-3g.40gb: 0
Allocatable:
nvidia.com/gpu: 16
nvidia.com/mig-1g.10gb: 0
nvidia.com/mig-2g.20gb: 0
nvidia.com/mig-3g.40gb: 0
Resource Requests Limits
nvidia.com/mig-1g.10gb 0 0
nvidia.com/mig-2g.20gb 0 0
nvidia.com/mig-3g.40gb 0 0
Test one nvidia-smi command:
egallen@laptop ~ % cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: command-nvidia-smi
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/cuda:12.1.0-base-ubi8
command: ["/bin/sh","-c"]
args: ["nvidia-smi"]
EOF
We can see the 16 MIG devices with 40GB of vRAM available:
egallen@laptop ~ % oc logs command-nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:1B:00.0 Off | On |
| N/A 25C P0 75W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:43:00.0 Off | On |
| N/A 27C P0 74W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:52:00.0 Off | On |
| N/A 32C P0 75W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:61:00.0 Off | On |
| N/A 30C P0 74W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:9D:00.0 Off | On |
| N/A 27C P0 75W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:C3:00.0 Off | On |
| N/A 25C P0 73W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:D1:00.0 Off | On |
| N/A 30C P0 77W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:DF:00.0 Off | On |
| N/A 31C P0 76W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+--------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+================================+===========+=======================|
| 0 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 1 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 1 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 2 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 2 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 3 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 3 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 4 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 4 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 5 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 5 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 6 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 6 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 7 1 0 0 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 7 2 0 1 | 16MiB / 40448MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
We can keep an OpenShift AI accelerator profile with the key nvidia.com/gpu
in a single
MIG strategy.
Run Mistral AI inference
The Mistral-7B-v0.2 model only requires 16GB of GPU RAM for inference; the DGX H100 is not mandatory but will help to scale the number of inferences per second.
The Mistral-8X7B-v0.1 model requires 100GB, so the DGX H100 will help you run this more demanding model.
We can create a new project llm-on-openshift
:
egallen@laptop ~ % oc new-project llm-on-openshift
Now using project "llm-on-openshift" on server "https://api.dgxh100.redhat.com:6443".
Creating a PVC
Create PersistentVolumeClaim called models-cache
.
Creating the deployment
We will modify a Kubernetes deployment yaml from Guillaume Moutier available here.
We prepare the deployment yaml:
egallen@laptop ~ % cat << EOF > deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: hf-text-generation-inference-server
labels:
app: hf-text-generation-inference-server
spec:
replicas: 1
selector:
matchLabels:
app: hf-text-generation-inference-server
template:
metadata:
creationTimestamp: null
labels:
app: hf-text-generation-inference-server
spec:
restartPolicy: Always
schedulerName: default-scheduler
affinity: {}
terminationGracePeriodSeconds: 120
securityContext: {}
containers:
- resources:
limits:
cpu: '8'
memory: 128Gi
nvidia.com/gpu: '1'
requests:
cpu: '8'
nvidia.com/gpu: '1'
readinessProbe:
httpGet:
path: /health
port: http
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 30
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: server
livenessProbe:
httpGet:
path: /health
port: http
scheme: HTTP
timeoutSeconds: 8
periodSeconds: 100
successThreshold: 1
failureThreshold: 3
env:
- name: MODEL_ID
value: mistralai/Mistral-7B-Instruct-v0.1
- name: MAX_INPUT_LENGTH
value: '1024'
- name: MAX_TOTAL_TOKENS
value: '2048'
- name: HUGGINGFACE_HUB_CACHE
value: /models-cache
- name: HUGGING_FACE_HUB_TOKEN
value: 'hf_IDAAAAAAAAAAAA'
- name: PORT
value: '3000'
- name: HOST
value: '0.0.0.0'
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
ports:
- name: http
containerPort: 3000
protocol: TCP
imagePullPolicy: IfNotPresent
startupProbe:
httpGet:
path: /health
port: http
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 30
successThreshold: 1
failureThreshold: 24
volumeMounts:
- name: models-cache
mountPath: /models-cache
- name: shm
mountPath: /dev/shm
terminationMessagePolicy: File
image: 'ghcr.io/huggingface/text-generation-inference:1.1.0'
volumes:
- name: models-cache
persistentVolumeClaim:
claimName: models-cache
- name: shm
emptyDir:
medium: Memory
sizeLimit: 1Gi
dnsPolicy: ClusterFirst
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 1
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
EOF
Checking the deployment status:
egallen@laptop ~ % oc get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
hf-text-generation-inference-server 1/1 1 1 2m53s
egallen@laptop ~ % oc get pods
NAME READY STATUS RESTARTS AGE
hf-text-generation-inference-server-7449c5f6c7-khx2m 1/1 Running 0 2m56s
egallen@laptop ~ % oc logs hf-text-generation-inference-server-7449c5f6c7-khx2m -f
...
{"timestamp":"2023-12-18T11:04:12.913335Z","level":"INFO","message":"Connected","target":"text_generation_router","filename":"router/src/main.rs","line_number":247}
{"timestamp":"2023-12-18T11:04:12.913335Z","level":"WARN","message":"Invalid hostname, defaulting to 0.0.0.0","target":"text_generation_router","filename":"router/src/main.rs","line_number":252}
We can validate that the inference-server pod can access the GPU:
egallen@laptop ~ % oc rsh hf-text-generation-inference-server-7449c5f6c7-khx2m
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:DF:00.0 Off | On |
| N/A 34C P0 118W / 700W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+--------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+================================+===========+=======================|
| 0 1 0 0 | 39046MiB / 40320MiB | 60 0 | 3 0 3 0 3 |
| | 3MiB / 65535MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
Creating the service
We prepare the service yaml file:
egallen@laptop ~ % cat << EOF > service.yaml
kind: Service
apiVersion: v1
metadata:
name: hf-text-generation-inference-server
labels:
app: hf-text-generation-inference-server
spec:
clusterIP: None
ipFamilies:
- IPv4
ports:
- name: http
protocol: TCP
port: 3000
targetPort: http
type: ClusterIP
ipFamilyPolicy: SingleStack
sessionAffinity: None
selector:
app: hf-text-generation-inference-server
EOF
We create the service:
egallen@laptop ~ % oc create -f service.yaml
service/hf-text-generation-inference-server created
Check the service status:
egallen@laptop ~ % oc get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hf-text-generation-inference-server ClusterIP None <none> 3000/TCP 7s
egallen@laptop ~ % oc describe service hf-text-generation-inference-server
Name: hf-text-generation-inference-server
Namespace: llm-on-openshift
Labels: app=hf-text-generation-inference-server
Annotations: <none>
Selector: app=hf-text-generation-inference-server
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: None
IPs: None
Port: http 3000/TCP
TargetPort: http/TCP
Endpoints: 10.128.0.102:3000
Session Affinity: None
Events: <none>
Creating the route
We prepare the route yaml file:
egallen@laptop ~ % cat << EOF > route.yaml
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: hf-text-generation-inference-server
labels:
app: hf-text-generation-inference-server
spec:
to:
kind: Service
name: hf-text-generation-inference-server
weight: 100
port:
targetPort: http
tls:
termination: edge
wildcardPolicy: None
EOF
egallen@laptop ~ % oc create -f route.yaml
route.route.openshift.io/hf-text-generation-inference-server created
Testing the model
We can now query the Mistral AI model with simple prompts and a curl commands.
The Mistral API provides a safe mode to enforce guardrails. With local models, you can prepend your messages with the following system prompt: “Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
Here are some Mistral-7B inference tests:
“Golang or Rust for a website”:
egallen@laptop ~ % curl https://hf-text-generation-inference-server-llm-on-openshift.apps.dgxh100.redhat.com/generate \
-X POST \
--insecure \
-d '{"inputs":"<s>[INST]Should I use the the Go Programming Language or Rust for a website ?[/INST]","parameters":{"max_new_tokens":1000}}' \
-H 'Content-Type: application/json' | jq . | sed 's/\\n/\n/g
{
"generated_text": " Both Go and Rust are great programming languages for building websites, but the choice ultimately depends on your specific needs and preferences.
Go, also known as Golang, is a relatively new programming language that was developed by Google. It is known for its simplicity, speed, and concurrency. Go is a good choice if you need to build a scalable and high-performance website that can handle a large number of concurrent users.
Rust, on the other hand, is a systems programming language that was developed by Mozilla. It is known for its safety, speed, and concurrency. Rust is a good choice if you need to build a website that requires low-level systems programming, such as a web server or a content delivery network.
In general, if you are building a simple website that doesn't require a lot of concurrency or low-level systems programming, either Go or Rust could be a good choice. However, if you need to build a website that requires high performance, scalability, and low-level systems programming, Rust might be the better choice."
}
“Kubernetes distribution”:
egallen@laptop ~ % curl https://hf-text-generation-inference-server-llm-on-openshift.apps.dgxh100.redhat.com/generate \
-X POST \
--insecure \
-d '{"inputs":"<s>[INST]What is the most widely used commercial Kubernetes distribution?[/INST]","parameters":{"max_new_tokens":25}}' \
-H 'Content-Type: application/json | jq . | sed 's/\\n/\n/g'
{
"generated_text":"
## Answer (1)
The most widely used commercial Kubernetes distribution is Red Hat OpenShift.
"}%
“Python coding”
egallen@laptop ~ % curl https://hf-text-generation-inference-server-llm-on-openshift.apps.dgxh100.redhat.com/generate \
-X POST \
--insecure \
-d '{"inputs":"<s>[INST]Write a basic python function that can generate fibbonaci sequence[/INST]","parameters":{"max_new_tokens":1000}}' \
-H 'Content-Type: application/json' | jq . | sed 's/\\n/\n/g'
{
"generated_text": " Here is a simple Python function that generates the Fibonacci sequence:
`` ` ```` ` ```` ` ``python
def fibonacci(n):
if n <= 0:
return []
elif n == 1:
return [0]
elif n == 2:
return [0, 1]
else:
fib_seq = fibonacci(n-1)
fib_seq.append(fib_seq[-1] + fib_seq[-2])
return fib_seq
`` ` ```` ` ```` ` ``
This function takes in an integer `n` as an argument, which represents the number of terms to generate in the Fibonacci sequence. If `n` is less than or equal to 0, the function returns an empty list. If `n` is equal to 1, the function returns [0]. If `n` is equal to 2, the function returns [0, 1]. For any other value of `n`, the function first calls itself with the argument `n-1`, and appends the sum of the last two elements in the returned list to the end of the list. This process continues until the desired number of terms have been generated."
}
Test in French:
egallen@egallen-mac test % curl https://hf-text-generation-inference-server-llm-on-openshift.apps.dgxh100.redhat.com/generate \
-X POST \
--insecure \
-d '{"inputs":"<s>[INST]Quelle est la liste des Présidents de la Cinquième République ?[/INST]","parameters":{"max_new_tokens":1000}}' \
-H 'Content-Type: application/json' | jq . | sed 's/\\n/\n/g'
{
"generated_text": " Voici la liste des présidents de la Cinquième République française depuis sa création en 1958 :
1. Charles de Gaulle (1958-1969)
2. Georges Pompidou (1969-1974)
3. Valéry Giscard d'Estaing (1974-1981)
4. François Mitterrand (1981-1995)
5. Jacques Chirac (1995-2007)
6. Nicolas Sarkozy (2007-2012)
7. François Hollande (2012-2017)
8. Emmanuel Macron (2017-en cours)"
}
Are you a poet?
egallen@egallen-mac test % curl https://hf-text-generation-inference-server-llm-on-openshift.apps.dgxh100.redhat.com/generate \
-X POST \
--insecure \
-d '{"inputs":"<s>[INST]Write a poem about the sun[/INST]","parameters":{"max_new_tokens":1000}}' \
-H 'Content-Type: application/json' | jq . | sed 's/\\n/\n/g'
{
"generated_text": " The Sun, the source of all light,
A ball of fire burning bright,
It rises in the east and sets in the west,
A daily cycle that never rests.
Its warmth embraces the earth,
A gentle touch that brings forth,
Life and growth in every form,
From the tallest tree to the smallest worm.
Its rays reach out to the sky,
A canvas of colors that never die,
A painting of beauty and wonder,
A sight that leaves us all in awe and thunder.
The Sun, a star that shines so bright,
A beacon of hope and light,
A symbol of life and love,
Its power and majesty we can't help but adore.
So let us bask in its glory,
And let its warmth and light tell a story,
For the Sun, is a gift from above,
A treasure that we should cherish and love."
}
Conclusion
This blog post provides a comprehensive guide on how to deploy OpenShift AI on the DGX H100 system for running large-scale machine learning applications. It covers all the steps from preparation to deployment, including setting up the OpenShift cluster, installing the necessary operators, and creating a persistent volume. It also includes examples of how to use PyTorch to run image classification tasks. Finally, it shows how to set up MIG devices and run Mistral AI inference.
In addition to the instructions provided in the article, here are some additional tips for deploying OpenShift AI on the DGX H100 system:
-
Optimize your OpenShift cluster configuration. This includes allocating sufficient resources to the nodes in the cluster and ensuring that the network bandwidth is adequate to support the data transfer requirements of your AI workloads.
-
Use RDMA feature to improve data transfer performance. This feature allows GPUs to directly communicate with each other over the PCIe bus, bypassing the host CPU and network.
-
Use NVIDIA’s optimized libraries for TensorFlow and PyTorch. These libraries are specifically designed to take advantage of the NVIDIA GPU architecture and can improve performance significantly.
You can now effectively deploy OpenShift AI on the DGX H100 system and run large-scale machine learning applications with impressive performance and efficiency.