NVIDIA vGPU software and license server with RHOSP 15
We will describe the steps to try and download NVIDIA GRID software:
- Create a NVIDIA account
- Redeem your Product Activation Key (PAK)
- Download packages
- Prepare the VM and operating system of the license server based on RHEL 7.7
- Download the Virtual GPU license Manager for Linux
- Install the NVIDIA vGPU license Server
- Registering the License Server and Getting License Files
- Import the License Server file
- Launch an instance on a RHOSP 15 platform
- Instance status
- Compute node status
- Check the license status with CLI
- Check the license status in the dashboard
To enable NVIDIA GRID for Red Hat OpenStack Platform, you will need (example for RHOSP 8):\
- one package for the RHOSP Compute KVM host (bare metal):
NVIDIA-vGPU-rhel-8.0-430.46.x86_64.rpm - and one script for the guest instance (virtual machine):
NVIDIA-Linux-x86_64-430.46-grid.run\ - one NVIDIA GRID license server installed on your network
Trial files can be downloaded here:
https://www.nvidia.com/en-us/data-center/resources/nvidia-enterprise-account/
You can contact a sales representative if you want to buy the vGPU software stack: https://www.nvidia.com/en-us/contact/sales/#assistance
Offical NVIDIA documentation is here:
https://docs.nvidia.com/grid/ls/latest/grid-license-server-user-guide/index.html
Create a NVIDIA account
NVIDIA account creation page:
https://www.nvidia.com/en-us/data-center/resources/nvidia-enterprise-account/
Create your account:
Confirmation message:
Click “SET PASSWORD” in the NVIDIA email:
Confirmation message:
Login:
Portal:
Redeem your Product Activation Key (PAK)
Click on “Redeem”:
Your licenses are available in your portal:
Download packages
Go to “Product Information” > Current Release “9.1” > “NVIDIA Virtual GPU Software”:
Confirm “Software Terms and Conditions”:
For RHOSP 13, choose “NVIDIA vGPU for RHEL KVM 7.7”.
For RHOSP 14, choose “NVIDIA vGPU for RHEL KVM 7.7”.
For RHOSP 15, choose “NVIDIA vGPU for RHEL KVM 8.0”.
RHOSP 13 and RHOSP 14 based on RHEL 7 vGPU software
Content of the RHEL 7 archive:
$ unzip NVIDIA-GRID-RHEL-7.7-430.46-431.79.zip
Archive: NVIDIA-GRID-RHEL-7.7-430.46-431.79.zip
inflating: 430.46-431.79-grid-gpumodeswitch-user-guide.pdf
inflating: 430.46-431.79-grid-licensing-user-guide.pdf
inflating: 430.46-431.79-grid-software-quick-start-guide.pdf
inflating: 430.46-431.79-grid-vgpu-release-notes-red-hat-el-kvm.pdf
inflating: 430.46-431.79-grid-vgpu-user-guide.pdf
inflating: 430.46-431.79-whats-new-vgpu.pdf
inflating: 431.79_grid_win10_server2016_server2019_64bit_international.exe
inflating: 431.79_grid_win7_win8_server2008R2_server2012R2_64bit_international.exe
inflating: NVIDIA-Linux-x86_64-430.46-grid.run
inflating: NVIDIA-vGPU-rhel-7.7-430.46.x86_64.rpm
RHOSP 15 based on RHEL 8 vGPU software
Content of the RHEL 8 archive:
$ unzip NVIDIA-GRID-RHEL-8.0-430.46-431.79.zip
Archive: NVIDIA-GRID-RHEL-8.0-430.46-431.79.zip
inflating: 430.46-431.79-grid-gpumodeswitch-user-guide.pdf
inflating: 430.46-431.79-grid-licensing-user-guide.pdf
inflating: 430.46-431.79-grid-software-quick-start-guide.pdf
inflating: 430.46-431.79-grid-vgpu-release-notes-red-hat-el-kvm.pdf
inflating: 430.46-431.79-grid-vgpu-user-guide.pdf
inflating: 430.46-431.79-whats-new-vgpu.pdf
inflating: 431.79_grid_win10_server2016_server2019_64bit_international.exe
inflating: 431.79_grid_win7_win8_server2008R2_server2012R2_64bit_international.exe
inflating: NVIDIA-Linux-x86_64-430.46-grid.run
inflating: NVIDIA-vGPU-rhel-8.0-430.46.x86_64.rpm
Prepare the VM and operating system of the license server based on RHEL 7.7
License server documentation is here: https://docs.nvidia.com/grid/ls/latest/grid-license-server-user-guide/index.html
We will deploy first a vm reachable on the network by the OSP instances: Source NVIDIA documentation
Download the last RHEL 7 qcow2 image available in this page after your RHN login: https://access.redhat.com/downloads/content/69/ver=/rhel—7/7.7/x86_64/product-software
Today, it’s “Red Hat Enterprise Linux 7.7 Update KVM Guest Image (20190924)”:
Download the image:
[egallen@kvmhost0 ~]$ wget "https://access.cdn.redhat.com/content/origin/files/sha256/XXXX/rhel-server-7.7-update-1-x86_64-kvm.qcow2?user=XXXX&_auth_=XXXX"
Create your vm image:
[egallen@kvmhost0 ~]$ sudo qemu-img create -f qcow2 -o preallocation=metadata /var/lib/libvirt/images/vgpu-license-server.qcow2 200G;
Formatting '/var/lib/libvirt/images/vgpu-license-server.qcow2', fmt=qcow2 size=214748364800 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
Expand your RHEL 7.7 image in the new qcow2:
[egallen@kvmhost0 ~]$ sudo virt-resize --expand /dev/sda1 /data/inetsoft/rhel-server-7.7-update-1-x86_64-kvm.qcow2 /var/lib/libvirt/images/vgpu-license-server.qcow2
[ 0.0] Examining /data/inetsoft/rhel-server-7.7-update-1-x86_64-kvm.qcow2
◓ 25% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════⟧ --:--
100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ --:--
**********
Summary of changes:
/dev/sda1: This partition will be resized from 7.8G to 200.0G. The
filesystem xfs on /dev/sda1 will be expanded using the ‘xfs_growfs’
method.
**********
[ 34.3] Setting up initial partition table on /var/lib/libvirt/images/vgpu-license-server.qcow2
[ 34.5] Copying /dev/sda1
100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
[ 59.0] Expanding /dev/sda1 using the ‘xfs_growfs’ method
Resize operation completed with no errors. Before deleting the old disk,
carefully check that the resized disk boots and works correctly.
Remove cloud-init and set your root password:
[egallen@kvmhost0 ~]$ sudo virt-customize -a /var/lib/libvirt/images/vgpu-license-server.qcow2 --uninstall cloud-init --root-password password:XXXXXXXX
[ 0.0] Examining the guest ...
[ 2.8] Setting a random seed
[ 2.8] Setting the machine ID in /etc/machine-id
[ 2.8] Uninstalling packages: cloud-init
[ 6.5] Setting passwords
[ 8.3] Finishing off
The recommended minimum configuration is 2 CPU cores and 4 Gbytes of RAM.
A high-end configuration of 4 or more CPU cores with 16 Gbytes of RAM is suitable for handling up to 150,000 licensed clients.
[egallen@kvmhost0 ~]$ sudo virt-install --ram 16384 --vcpus 4 --os-variant rhel7 \
--disk path=/var/lib/libvirt/images/vgpu-license-server.qcow2,device=disk,bus=virtio,format=qcow2 \
--graphics vnc,listen=0.0.0.0 --noautoconsole \
--network bridge=br0 \
--name vgpu-license-server --dry-run \
--print-xml > /tmp/vgpu-license-server.xml;
[egallen@kvmhost0 ~]$ sudo virsh define --file /tmp/vgpu-license-server.xml
Domain vgpu-license-server defined from /tmp/vgpu-license-server.xml
[egallen@kvmhost0 ~]$ sudo virsh start vgpu-license-server
Domain vgpu-license-server started
Get the DHCP IP:
[egallen@kvmhost0 ~]$ sudo virsh console vgpu-license-server
Connected to domain vgpu-license-server
Escape character is ^]
Red Hat Enterprise Linux Server 7.7 (Maipo)
Kernel 3.10.0-1062.1.2.el7.x86_64 on an x86_64
unused login: root
Password:
[root@unused ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:c1:6d:ba brd ff:ff:ff:ff:ff:ff
inet 10.10.10.220/24 brd 10.10.10.255 scope global noprefixroute dynamic eth0
valid_lft 86378sec preferred_lft 86378sec
inet6 2620:52:0:27a8:5054:ff:fec1:6dba/64 scope global noprefixroute dynamic
valid_lft 2591978sec preferred_lft 604778sec
inet6 fe80::5054:ff:fec1:6dba/64 scope link noprefixroute
valid_lft forever preferred_lft forever
Logout
Login with a ssh client:
laptop:~ egallen$ ssh root@10.10.10.220
The authenticity of host '10.10.10.220 (10.10.10.220)' can't be established.
ECDSA key fingerprint is SHA256:moQgFDyF8+JcXXXXXXZ249CS65IQsuU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.10.10.220' (ECDSA) to the list of known hosts.
root@10.10.10.220's password:
Last login: Sat Oct 19 11:05:33 2019
[root@unused ~]#
Create your user:
[root@unused ~]# adduser egallen
[root@unused ~]# passwd egallen
Changing password for user egallen.
New password:
BAD PASSWORD: The password fails the dictionary check - it does not contain enough DIFFERENT characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@unused ~]# echo "egallen ALL=(root) NOPASSWD:ALL" | tee -a /etc/sudoers.d/egallen
egallen ALL=(root) NOPASSWD:ALL
Set hostname:
[root@unused ~]# hostnamectl set-hostname vgpu-license-server.lan.redhat.com
[root@unused ~]# exit
logout
Connection to 10.10.10.220 closed.
laptop:~ egallen$ ssh root@10.10.10.220
Last login: Sat Oct 19 11:07:18 2019 from XXX-XX-XX.XXXX.redhat.com
[root@vgpu-license-server ~]# logout
Connection to 10.10.10.220 closed.
Copy your ssh key:
laptop:~ egallen$ ssh-copy-id egallen@10.10.10.220
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
egallen@10.10.10.220's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'egallen@10.10.10.220'"
and check to make sure that only the key(s) you wanted were added.
Create a dns entry in your laptop:
laptop:~ egallen$ echo "10.10.10.220 vgpu-license-server" | sudo tee -a /etc/hosts
Password:
10.10.10.220 vgpu-license-server
Login:
laptop:~ egallen$ ssh vgpu-license-server
The authenticity of host 'vgpu-license-server (10.10.10.220)' can't be established.
ECDSA key fingerprint is SHA256:moQgFXXXXXXXXXX49CS65IQsuU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'vgpu-license-server' (ECDSA) to the list of known hosts.
[egallen@vgpu-license-server ~]$
Check your OS release:
[egallen@vgpu-license-server ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)
Register your server:
[egallen@vgpu-license-server ~]$ sudo subscription-manager register --username myrhnlogin
Registering to: subscription.rhsm.redhat.com:443/subscription
Password:
The system has been registered with ID: XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXX
The registered system name is: vgpu-license-server.lan.redhat.com
WARNING
The yum/dnf plugins: /etc/yum/pluginconf.d/subscription-manager.conf, /etc/yum/pluginconf.d/product-id.conf were automatically enabled for the benefit of Red Hat Subscription Management. If not desired, use "subscription-manager config --rhsm.auto_enable_yum_plugins=0" to block this behavior.
Pick your pool ID:
sudo subscription-manager list --available
Attach the pool ID:
[egallen@vgpu-license-server ~]$ sudo subscription-manager attach --pool=XXXXXXXXXXXXXXXXXXXXXXX
Successfully attached a subscription for: Employee SKU
Disable all repositories:
[egallen@vgpu-license-server ~]$ sudo subscription-manager repos --disable=*
Enable RHEL7 repository:
[egallen@vgpu-license-server ~]$ sudo subscription-manager repos --enable=rhel-7-server-rpms
Install tmux:
[egallen@vgpu-license-server ~]$ sudo yum install tmux -y
[egallen@vgpu-license-server ~]$ tmux
Upgrade your system and reboot:
[egallen@vgpu-license-server ~]$ sudo yum upgrade -y
[egallen@vgpu-license-server ~]$ sudo systemctl reboot
Install Java:
[egallen@vgpu-license-server ~]$ sudo yum install java -y
...
Complete!
[egallen@vgpu-license-server ~]$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
Install Tomcat:
[egallen@vgpu-license-server ~]$ sudo yum install tomcat tomcat-webapps -y
Enable the Tomcat service at boot:
[egallen@vgpu-license-server ~]$ sudo systemctl enable tomcat.service
Created symlink from /etc/systemd/system/multi-user.target.wants/tomcat.service to /usr/lib/systemd/system/tomcat.service.
Start the tomcat service:
[egallen@vgpu-license-server ~]$ sudo systemctl start tomcat.service
Check if the sercice can be reachable http://your_IP-adress:8080/ :
Download the Virtual GPU license Manager for Linux
In portal, download the Virtual GPU Software by clicking on this link: NVIDIA SOFTWARE LICENSING CENTER > PRODUCT INFORMATION : SOFTWARE > Current Release “9.1” > “NVIDIA Virtual GPU Software” > “2019.05 license Manager for Linux”
You will get this file: NVIDIA-ls-linux-2019.05.0.26416627.zip
Install the NVIDIA vGPU license Server
Unzip your license server archive:
[egallen@vgpu-license-server ~]$ unzip NVIDIA-ls-linux-2019.05.0.26416627.zip
Archive: NVIDIA-ls-linux-2019.05.0.26416627.zip
inflating: grid-license-server-release-notes.pdf
inflating: grid-license-server-user-guide.pdf
inflating: grid-software-quick-start-guide.pdf
inflating: setup.bin
Add execute permission to the install binary:
[egallen@vgpu-license-server ~]$ chmod +x setup.bin
Preparing to install
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
Launching installer...
===============================================================================
License Server (created with InstallAnywhere)
-------------------------------------------------------------------------------
Preparing CONSOLE Mode Installation...
===============================================================================
Introduction
------------
InstallAnywhere will guide you through the installation of License Server.
It is strongly recommended that you quit all programs before continuing with
this installation.
Respond to each prompt to proceed to the next step in the installation. If
you want to change something on a previous step, type 'back'.
You may cancel this installation at any time by typing 'quit'.
PRESS <ENTER> TO CONTINUE:
DO YOU ACCEPT THE TERMS OF THIS LICENSE AGREEMENT? (Y/N): Y
ENTER AN ABSOLUTE PATH, OR PRESS <ENTER> TO ACCEPT THE DEFAULT: /usr/local/nvidia\
INSTALL FOLDER IS: /usr/local/nvidia IS THIS CORRECT? (Y/N): Y
Enter local Tomcat server path: /usr/share/tomcat
ENTER A COMMA-SEPARATED LIST OF NUMBERS REPRESENTING THE DESIRED CHOICES, OR PRESS <ENTER> TO ACCEPT THE DEFAULT:
===============================================================================
Pre-Installation Summary
------------------------
Please Review the Following Before Continuing:
Product Name:
License Server
Install Folder:
/usr/local/nvidia
Link Folder:
/root/NVIDIA Corporation/License Server
Disk Space Information (for Installation Target):
Required: 199,627,035 Bytes
Available: 211,963,322,368 Bytes
PRESS <ENTER> TO CONTINUE:
===============================================================================
Installing...
-------------
[==================|==================|==================|==================]
[------------------|------------------|------------------|------------------]
Executing NVIDIA License Server Installation Script...
Starting NVIDIA License Server
Opening License Server Port 7070 in Firewall
Starting Tomcat Service
===============================================================================
Install Complete
----------------
License Server has been successfully installed to:
/usr/local/nvidia
PRESS <ENTER> TO EXIT THE INSTALLER:
Your License server is available: http://vgpu-licence-server:8080/
Change the admin password:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin Admin@123 -users -edit admin MyNewPassword
User authentication succeeded.
User [admin]'s password and/or roles have been edited successfully.
Export the password:
[egallen@vgpu-licence-server enterprise]$ export MY_NVIDIA_PASSWORD="MyNewPassword"
Check rights:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -users
User authentication succeeded.
=======================================================================================
Username Roles
=======================================================================================
producer ROLE_PRODUCER, ROLE_DROPCLIENT, ROLE_READ, ROLE_RESERVATIONS
admin ROLE_ADMIN, ROLE_DROPCLIENT, ROLE_RESERVATIONS, ROLE_READ
Check the server status:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -status
User authentication succeeded.
Copyright (c) 2015-2018 Flexera LLC. All Rights Reserved.
(version) Version : 2019.02
(buildVersion) Build Version : 244401
The server is in active state.
Server: http://localhost:7070/ active
Backup Server: Not configured
Check the default config:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -status
User authentication succeeded.
General License Server Information
-----------------------------------
(license_server_url) IP : 127.0.0.1
(host_name) Host Name : localhost
(publisher_name) Publisher Name : nvidia
(host_id) Binding ID : Not Configured (Not Configured)
(license_server_port) Port : 7070
(licensing.backup.uri) Backup URI : Not Configured
Licensing Policy Details:
-----------------------------------
(licensing.borrowIntervalMax) Borrow Interval Maximum : NOT_CONFIGURED
(licensing.clientExpiryTimer) Client Expiry Timer Interval : 2s
(licensing.hostIdValidationInterval) Host ID Validation Interval : 2m
(licensing.allowVirtualClients) Allow Virtual Clients : true
(licensing.allowVirtualServer) Allow Virtual Server : true
(licensing.defaultBorrowGranularity) Default Borrow Granularity : MINUTE
(licensing.borrowInterval) Default Borrow Interval : NOT_CONFIGURED
(licensing.renewInterval) Default Renew Interval : 16
(licensing.registrationRequired) Registration Required :
(licensing.responseLifetime) Response Lifetime : 1d
(licensing.disableVirtualMachineCheck)
Disable Virtual Machine Check : false
Server Sync Settings :
-----------------------------------
Security Related Settings
-----------------------------------
(security.enabled) REST Security enabled : true
Log Settings :
-----------------------------------
(logging.threshold) Log level to record log messages : ERROR
(logging.directory) Directory where logs will be stored : /var/opt/flexnetls/nvidia/logs
Capability Related Details
-----------------------------------
-----------------------------------
Check the default license available:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses
User authentication succeeded.
(license_server_url) License Server : 127.0.0.1:7070
(no_of_features) Number of features : 0
(no_of_client) Number of clients : 0
Check the default features:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -features
User authentication succeeded.
================================================================================
Name Count Version Type Expiration
================================================================================
Total number of features : 0
Show licenses:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses -verbose
User authentication succeeded.
=======================================================================================
Feature ID Feature Name Feature Version Feature Count Used/Available
=======================================================================================
=======================================================================================
Device Information:
-------------------------------------------------------------
Device Name Feature Registered(Used Count)
-------------------------------------------------------------
=======================================================================================
Total feature count : 0
Total feature count used : 0
Total uncounted features : 0
=======================================================================================
Registering the License Server and Getting License Files
Get you server mac adress:
[egallen@vgpu-licence-server enterprise]$ cat /sys/class/net/eth0/address | sed 's/://g' | sed 's/[a-z]/\U&/g'
525400C1XXXX
You can also find this unique id in your dashboard “Server host ID”:
Import the License Server file
Check license status before:
Click on “Map Add-Ons” and map 64 licenses:
Check license status after:
Check your license status:
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses -verbose
User authentication succeeded.
=======================================================================================
Feature ID Feature Name Feature Version Feature Count Used/Available
=======================================================================================
1 Quadro-Virtual-DWS 5.0 0/64
2 GRID-Virtual-Apps 3.0 0/64
=======================================================================================
Device Information:
-------------------------------------------------------------
Device Name Feature Registered(Used Count)
-------------------------------------------------------------
=======================================================================================
Total feature count : 128
Total feature count used : 0
Total uncounted features : 0
=======================================================================================
Launch an instance on a RHOSP 15
(overcloud) [stack@accelab-director ~]$ openstack server create --flavor m1.small-gpu --image rhel8 --security-group web --nic net-id=internal0 --key-name lambda instance0
+-------------------------------------+-----------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| adminPass | AN2aSRHL32eZ |
| config_drive | |
| created | 2019-10-19T22:04:26Z |
| flavor | m1.small-gpu (dbcb3b87-3206-450f-927c-5d709ab48a21) |
| hostId | |
| id | e543816d-1377-421b-a401-14e1108adfae |
| image | rhel8 (42b4c71e-3994-4501-a281-35f12a1f4af4) |
| key_name | lambda |
| name | instance0 |
| progress | 0 |
| project_id | 8998a44fcb9d4cb2aaaa1893e54f74f9 |
| properties | |
| security_groups | name='6c36fcea-cc1e-4bc8-95bd-9f556f9d8327' |
| status | BUILD |
| updated | 2019-10-19T22:04:26Z |
| user_id | 54d15ffdd5384c229a11d9f0824a3763 |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------+
Attach one floating IP:
(overcloud) [stack@accelab-director ~]$ FLOATING_IP_ID=$( openstack floating ip list -f value -c ID --status 'DOWN' | head -n 1 ) ; openstack server add floating ip instance0 $FLOATING_IP_ID
Check instance lanched:
(overcloud) [stack@accelab-director ~]$ openstack server list
+--------------------------------------+-----------+--------+----------------------------------------+-------+--------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------+--------+----------------------------------------+-------+--------------+
| e543816d-1377-421b-a401-14e1108adfae | instance0 | ACTIVE | internal0=172.31.0.148, 192.168.168.46 | rhel8 | m1.small-gpu |
+--------------------------------------+-----------+--------+----------------------------------------+-------+--------------+
Instance status
Connect via ssh into the new VM:
(overcloud) [stack@accelab-director ~]$ ssh cloud-user@192.168.168.46
Activate the web console with: systemctl enable --now cockpit.socket
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register
Last login: Sat Oct 19 18:27:20 2019 from 192.168.168.2
[cloud-user@instance0 ~]$
Check drivers:
[cloud-user@instance0 ~]$ nvidia-smi
Sat Oct 19 18:31:40 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46 Driver Version: 430.46 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID V100-1Q On | 00000000:00:05.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 1014MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Check nvidia-gridd logs:
[cloud-user@instance0 ~]$ sudo cat /var/log/messages | grep nvidia-gridd
Oct 19 18:33:45 instance0 nvidia-gridd[12584]: License acquired successfully. (Info: http://10.10.10.220:7070/request; Quadro-Virtual-DWS,5.0)
Check nvidia-gridd service status:
[cloud-user@instance0 ~]$ sudo systemctl status nvidia-gridd
● nvidia-gridd.service - NVIDIA Grid Daemon
Loaded: loaded (/usr/lib/systemd/system/nvidia-gridd.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2019-10-19 18:33:40 EDT; 1min 36s ago
Process: 12580 ExecStopPost=/bin/rm -rf /var/run/nvidia-gridd (code=exited, status=0/SUCCESS)
Process: 12581 ExecStart=/usr/bin/nvidia-gridd (code=exited, status=0/SUCCESS)
Main PID: 12584 (nvidia-gridd)
Tasks: 4 (limit: 26213)
Memory: 35.3M
CGroup: /system.slice/nvidia-gridd.service
└─12584 /usr/bin/nvidia-gridd
Oct 19 18:33:40 instance0 systemd[1]: Started NVIDIA Grid Daemon.
Oct 19 18:33:45 instance0 nvidia-gridd[12584]: License acquired successfully. (Info: http://10.10.10.220:7070/request; Quadro-Virtual-DWS,5.0)
Check the configuration used for the nvidia-gridd servive:
[cloud-user@instance0 ~]$ grep -v '^#' /etc/nvidia/gridd.conf | sed '/^$/d'
ServerAddress=10.10.10.220
ServerPort=7070
FeatureType=2
EnableUI=TRUE
Compute node status
Check the vGPU status on the RHOSP 15 KVM compute node
[heat-admin@overcloud-novacompute-0 ~]$ cat /etc/redhat-release ; cat /etc/rhosp-release
Red Hat Enterprise Linux release 8.0 (Ootpa)
Red Hat OpenStack Platform release 15.0.0 Beta (Stein)
List running libvirt VMs:
[heat-admin@overcloud-novacompute-0 ~]$ sudo virsh list
Id Name State
----------------------------------------------------
2 instance-0000000e running
Check mdev associated to the VM:
[heat-admin@overcloud-novacompute-0 ~]$ sudo virsh dumpxml instance-0000000e | grep mdev
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
Check the nvidia-smi vgpu status:
[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu
Sat Oct 19 22:42:04 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46 Driver Version: 430.46 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 Tesla V100-PCIE-16GB | 00000000:3B:00.0 | 0% |
+---------------------------------+------------------------------+------------+
| 1 Tesla V100-PCIE-16GB | 00000000:D8:00.0 | 0% |
| 3251634194 GRID V100-1Q | e543... instance-0000000e | 0% |
+---------------------------------+------------------------------+------------+
Check the nvidia-smi general status:
[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi
Sat Oct 19 22:41:13 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46 Driver Version: 430.46 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:3B:00.0 Off | Off |
| N/A 31C P0 26W / 250W | 39MiB / 16383MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... On | 00000000:D8:00.0 Off | 0 |
| N/A 35C P0 25W / 250W | 1060MiB / 16383MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 316556 C+G vgpu 1010MiB |
+-----------------------------------------------------------------------------+
We find the OpenStack id “e543816d-1377-421b-a401-14e1108adfae” and libvirt name:
[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu -q
GPU 00000000:3B:00.0
Active vGPUs : 0
GPU 00000000:D8:00.0
Active vGPUs : 1
vGPU ID : 3251634194
VM UUID : e543816d-1377-421b-a401-14e1108adfae
VM Name : instance-0000000e
vGPU Name : GRID V100-1Q
vGPU Type : 105
vGPU UUID : ec89f3f7-0351-47fd-9ec2-bb90753885d2
Guest Driver Version : 430.46
License Status : Licensed
Accounting Mode : Disabled
ECC Mode : Enabled
Accounting Buffer Size : 4000
Frame Rate Limit : 60 FPS
FB Memory Usage
Total : 1024 MiB
Used : 0 MiB
Free : 1024 MiB
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu -u
# GPU vGPU sm mem enc dec
# Idx Id % % % %
0 - - - - -
1 3251634194 0 0 0 0
0 - - - - -
1 3251634194 0 0 0 0
0 - - - - -
1 3251634194 0 0 0 0
0 - - - - -
1 3251634194 0 0 0 0
...
[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu -c
GPU 00000000:3B:00.0
GRID V100-1Q
GRID V100-2Q
GRID V100-4Q
GRID V100-8Q
GRID V100-16Q
GRID V100-1A
GRID V100-2A
GRID V100-4A
GRID V100-8A
GRID V100-16A
GRID V100-1B
GRID V100-1B4
GRID V100-2B
GRID V100-2B4
GRID V100-4C
GRID V100-8C
GRID V100-16C
GPU 00000000:D8:00.0
GRID V100-1Q
Check the license status with CLI
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses
User authentication succeeded.
(license_server_url) License Server : 127.0.0.1:7070
(no_of_features) Number of features : 2
(no_of_client) Number of clients : 1
[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses -verbose
User authentication succeeded.
=======================================================================================
Feature ID Feature Name Feature Version Feature Count Used/Available
=======================================================================================
1 Quadro-Virtual-DWS 5.0 1/63
2 GRID-Virtual-Apps 3.0 0/64
=======================================================================================
Device Information:
-------------------------------------------------------------
Device Name Feature Registered(Used Count)
-------------------------------------------------------------
FA163ED6243E Quadro-Virtual-DWS(1)
=======================================================================================
Total feature count : 128
Total feature count used : 1
Total uncounted features : 0
=======================================================================================
Check the license status in the dashboard
License status before the boot:
We launch the OpenStack instance:
(overcloud) [stack@accelab-director ~]$ openstack server create --flavor m1.small-gpu --image rhel8 --security-group web --nic net-id=internal0 --key-name lambda instance0
We cand find the instance started:
Detail of the instance started: