Erwan Gallen

Oct 19, 2019 16 min read

NVIDIA vGPU software and license server with RHOSP 15

We will describe the steps to try and download NVIDIA GRID software:

Create a NVIDIA account
Redeem your Product Activation Key (PAK)
Download packages
Prepare the VM and operating system of the license server based on RHEL 7.7
Download the Virtual GPU license Manager for Linux
Install the NVIDIA vGPU license Server
Registering the License Server and Getting License Files
Import the License Server file
Launch an instance on a RHOSP 15 platform
Instance status
Compute node status
Check the license status with CLI
Check the license status in the dashboard

To enable NVIDIA GRID for Red Hat OpenStack Platform, you will need (example for RHOSP 8):\

one package for the RHOSP Compute KVM host (bare metal):
NVIDIA-vGPU-rhel-8.0-430.46.x86_64.rpm
and one script for the guest instance (virtual machine):
NVIDIA-Linux-x86_64-430.46-grid.run\
one NVIDIA GRID license server installed on your network

Trial files can be downloaded here:
https://www.nvidia.com/en-us/data-center/resources/nvidia-enterprise-account/

You can contact a sales representative if you want to buy the vGPU software stack: https://www.nvidia.com/en-us/contact/sales/#assistance

Offical NVIDIA documentation is here:
https://docs.nvidia.com/grid/ls/latest/grid-license-server-user-guide/index.html

Create a NVIDIA account

NVIDIA account creation page:
https://www.nvidia.com/en-us/data-center/resources/nvidia-enterprise-account/

Create your account:
NVIDIA account creation page

Confirmation message:
NVIDIA account creation page

Click “SET PASSWORD” in the NVIDIA email:
NVIDIA account creation page

Confirmation message:
NVIDIA account creation page

Portal:
NVIDIA account creation page

Redeem your Product Activation Key (PAK)

Click on “Redeem”:
Redeem your Product Activation Key

Your licenses are available in your portal:
Redeem your Product Activation Key

Download packages

Go to “Product Information” > Current Release “9.1” > “NVIDIA Virtual GPU Software”:
Redeem your Product Activation Key

Confirm “Software Terms and Conditions”: Choose your product

For RHOSP 13, choose “NVIDIA vGPU for RHEL KVM 7.7”.
For RHOSP 14, choose “NVIDIA vGPU for RHEL KVM 7.7”.
For RHOSP 15, choose “NVIDIA vGPU for RHEL KVM 8.0”.

Choose your product

RHOSP 13 and RHOSP 14 based on RHEL 7 vGPU software

Content of the RHEL 7 archive:

$ unzip NVIDIA-GRID-RHEL-7.7-430.46-431.79.zip 
Archive:  NVIDIA-GRID-RHEL-7.7-430.46-431.79.zip
  inflating: 430.46-431.79-grid-gpumodeswitch-user-guide.pdf  
  inflating: 430.46-431.79-grid-licensing-user-guide.pdf  
  inflating: 430.46-431.79-grid-software-quick-start-guide.pdf  
  inflating: 430.46-431.79-grid-vgpu-release-notes-red-hat-el-kvm.pdf  
  inflating: 430.46-431.79-grid-vgpu-user-guide.pdf  
  inflating: 430.46-431.79-whats-new-vgpu.pdf  
  inflating: 431.79_grid_win10_server2016_server2019_64bit_international.exe  
  inflating: 431.79_grid_win7_win8_server2008R2_server2012R2_64bit_international.exe  
  inflating: NVIDIA-Linux-x86_64-430.46-grid.run  
  inflating: NVIDIA-vGPU-rhel-7.7-430.46.x86_64.rpm

RHOSP 15 based on RHEL 8 vGPU software

Content of the RHEL 8 archive:

$ unzip NVIDIA-GRID-RHEL-8.0-430.46-431.79.zip
Archive:  NVIDIA-GRID-RHEL-8.0-430.46-431.79.zip
  inflating: 430.46-431.79-grid-gpumodeswitch-user-guide.pdf  
  inflating: 430.46-431.79-grid-licensing-user-guide.pdf  
  inflating: 430.46-431.79-grid-software-quick-start-guide.pdf  
  inflating: 430.46-431.79-grid-vgpu-release-notes-red-hat-el-kvm.pdf  
  inflating: 430.46-431.79-grid-vgpu-user-guide.pdf  
  inflating: 430.46-431.79-whats-new-vgpu.pdf  
  inflating: 431.79_grid_win10_server2016_server2019_64bit_international.exe  
  inflating: 431.79_grid_win7_win8_server2008R2_server2012R2_64bit_international.exe  
  inflating: NVIDIA-Linux-x86_64-430.46-grid.run  
  inflating: NVIDIA-vGPU-rhel-8.0-430.46.x86_64.rpm

Prepare the VM and operating system of the license server based on RHEL 7.7

License server documentation is here: https://docs.nvidia.com/grid/ls/latest/grid-license-server-user-guide/index.html

We will deploy first a vm reachable on the network by the OSP instances: NVIDIA GRID license server Source NVIDIA documentation

Download the last RHEL 7 qcow2 image available in this page after your RHN login: https://access.redhat.com/downloads/content/69/ver=/rhel—7/7.7/x86_64/product-software

Today, it’s “Red Hat Enterprise Linux 7.7 Update KVM Guest Image (20190924)”: RHEL 7.7 KVM Guest Image qcow2

Download the image:

[egallen@kvmhost0 ~]$ wget "https://access.cdn.redhat.com/content/origin/files/sha256/XXXX/rhel-server-7.7-update-1-x86_64-kvm.qcow2?user=XXXX&_auth_=XXXX"

Create your vm image:

[egallen@kvmhost0 ~]$ sudo qemu-img create -f qcow2 -o preallocation=metadata /var/lib/libvirt/images/vgpu-license-server.qcow2 200G;
Formatting '/var/lib/libvirt/images/vgpu-license-server.qcow2', fmt=qcow2 size=214748364800 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16

Expand your RHEL 7.7 image in the new qcow2:

[egallen@kvmhost0 ~]$ sudo virt-resize --expand /dev/sda1 /data/inetsoft/rhel-server-7.7-update-1-x86_64-kvm.qcow2 /var/lib/libvirt/images/vgpu-license-server.qcow2
[   0.0] Examining /data/inetsoft/rhel-server-7.7-update-1-x86_64-kvm.qcow2
◓ 25% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════⟧ --:--
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ --:--
**********

Summary of changes:

/dev/sda1: This partition will be resized from 7.8G to 200.0G.  The 
filesystem xfs on /dev/sda1 will be expanded using the ‘xfs_growfs’ 
method.

**********
[  34.3] Setting up initial partition table on /var/lib/libvirt/images/vgpu-license-server.qcow2
[  34.5] Copying /dev/sda1
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
[  59.0] Expanding /dev/sda1 using the ‘xfs_growfs’ method

Resize operation completed with no errors.  Before deleting the old disk, 
carefully check that the resized disk boots and works correctly.

Remove cloud-init and set your root password:

[egallen@kvmhost0 ~]$ sudo virt-customize -a /var/lib/libvirt/images/vgpu-license-server.qcow2 --uninstall cloud-init --root-password password:XXXXXXXX
[   0.0] Examining the guest ...
[   2.8] Setting a random seed
[   2.8] Setting the machine ID in /etc/machine-id
[   2.8] Uninstalling packages: cloud-init
[   6.5] Setting passwords
[   8.3] Finishing off

The recommended minimum configuration is 2 CPU cores and 4 Gbytes of RAM.
A high-end configuration of 4 or more CPU cores with 16 Gbytes of RAM is suitable for handling up to 150,000 licensed clients.

[egallen@kvmhost0 ~]$ sudo virt-install --ram 16384 --vcpus 4 --os-variant rhel7 \
--disk path=/var/lib/libvirt/images/vgpu-license-server.qcow2,device=disk,bus=virtio,format=qcow2 \
--graphics vnc,listen=0.0.0.0 --noautoconsole \
--network bridge=br0 \
--name vgpu-license-server --dry-run \
--print-xml > /tmp/vgpu-license-server.xml;

[egallen@kvmhost0 ~]$ sudo virsh define --file /tmp/vgpu-license-server.xml
Domain vgpu-license-server defined from /tmp/vgpu-license-server.xml

[egallen@kvmhost0 ~]$ sudo virsh start vgpu-license-server
Domain vgpu-license-server started

Get the DHCP IP:

[egallen@kvmhost0 ~]$ sudo virsh console vgpu-license-server
Connected to domain vgpu-license-server
Escape character is ^]

Red Hat Enterprise Linux Server 7.7 (Maipo)
Kernel 3.10.0-1062.1.2.el7.x86_64 on an x86_64

unused login: root
Password: 
[root@unused ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:c1:6d:ba brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.220/24 brd 10.10.10.255 scope global noprefixroute dynamic eth0
       valid_lft 86378sec preferred_lft 86378sec
    inet6 2620:52:0:27a8:5054:ff:fec1:6dba/64 scope global noprefixroute dynamic 
       valid_lft 2591978sec preferred_lft 604778sec
    inet6 fe80::5054:ff:fec1:6dba/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Logout

laptop:~ egallen$ ssh root@10.10.10.220
The authenticity of host '10.10.10.220 (10.10.10.220)' can't be established.
ECDSA key fingerprint is SHA256:moQgFDyF8+JcXXXXXXZ249CS65IQsuU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.10.10.220' (ECDSA) to the list of known hosts.
root@10.10.10.220's password: 
Last login: Sat Oct 19 11:05:33 2019
[root@unused ~]#

Create your user:

[root@unused ~]# adduser egallen
[root@unused ~]# passwd egallen
Changing password for user egallen.
New password: 
BAD PASSWORD: The password fails the dictionary check - it does not contain enough DIFFERENT characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@unused ~]# echo "egallen ALL=(root) NOPASSWD:ALL" | tee -a /etc/sudoers.d/egallen
egallen ALL=(root) NOPASSWD:ALL

Set hostname:

[root@unused ~]# hostnamectl set-hostname vgpu-license-server.lan.redhat.com
[root@unused ~]# exit
logout
Connection to 10.10.10.220 closed.
laptop:~ egallen$ ssh root@10.10.10.220
Last login: Sat Oct 19 11:07:18 2019 from XXX-XX-XX.XXXX.redhat.com
[root@vgpu-license-server ~]# logout
Connection to 10.10.10.220 closed.

Copy your ssh key:

laptop:~ egallen$ ssh-copy-id egallen@10.10.10.220
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
egallen@10.10.10.220's password: 

Number of key(s) added:        1

Now try logging into the machine, with:   "ssh 'egallen@10.10.10.220'"
and check to make sure that only the key(s) you wanted were added.

Create a dns entry in your laptop:

laptop:~ egallen$ echo "10.10.10.220 vgpu-license-server" | sudo tee -a /etc/hosts
Password:
10.10.10.220 vgpu-license-server

laptop:~ egallen$ ssh vgpu-license-server
The authenticity of host 'vgpu-license-server (10.10.10.220)' can't be established.
ECDSA key fingerprint is SHA256:moQgFXXXXXXXXXX49CS65IQsuU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'vgpu-license-server' (ECDSA) to the list of known hosts.
[egallen@vgpu-license-server ~]$

Check your OS release:

[egallen@vgpu-license-server ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 (Maipo)

[egallen@vgpu-license-server ~]$ sudo subscription-manager register --username myrhnlogin
Registering to: subscription.rhsm.redhat.com:443/subscription
Password: 
The system has been registered with ID: XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXX
The registered system name is: vgpu-license-server.lan.redhat.com

WARNING

The yum/dnf plugins: /etc/yum/pluginconf.d/subscription-manager.conf, /etc/yum/pluginconf.d/product-id.conf were automatically enabled for the benefit of Red Hat Subscription Management. If not desired, use "subscription-manager config --rhsm.auto_enable_yum_plugins=0" to block this behavior.

Pick your pool ID:

sudo subscription-manager list --available

Attach the pool ID:

[egallen@vgpu-license-server ~]$ sudo subscription-manager attach --pool=XXXXXXXXXXXXXXXXXXXXXXX
Successfully attached a subscription for: Employee SKU

Disable all repositories:

[egallen@vgpu-license-server ~]$ sudo subscription-manager repos --disable=*

Enable RHEL7 repository:

[egallen@vgpu-license-server ~]$ sudo subscription-manager repos --enable=rhel-7-server-rpms

Install tmux:

[egallen@vgpu-license-server ~]$ sudo yum install tmux -y
[egallen@vgpu-license-server ~]$ tmux

Upgrade your system and reboot:

[egallen@vgpu-license-server ~]$ sudo yum upgrade -y
[egallen@vgpu-license-server ~]$ sudo systemctl reboot

Install Java:

[egallen@vgpu-license-server ~]$ sudo yum install java -y
...
Complete!

[egallen@vgpu-license-server ~]$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

Install Tomcat:

[egallen@vgpu-license-server ~]$ sudo yum install tomcat tomcat-webapps -y

Enable the Tomcat service at boot:

[egallen@vgpu-license-server ~]$  sudo systemctl enable tomcat.service
Created symlink from /etc/systemd/system/multi-user.target.wants/tomcat.service to /usr/lib/systemd/system/tomcat.service.

Start the tomcat service:

[egallen@vgpu-license-server ~]$ sudo systemctl start tomcat.service

Check if the sercice can be reachable http://your_IP-adress:8080/ : Tomcat default configuration

Download the Virtual GPU license Manager for Linux

In portal, download the Virtual GPU Software by clicking on this link: NVIDIA SOFTWARE LICENSING CENTER > PRODUCT INFORMATION : SOFTWARE > Current Release “9.1” > “NVIDIA Virtual GPU Software” > “2019.05 license Manager for Linux” RHEL 7.7 KVM Guest Image qcow2

You will get this file: NVIDIA-ls-linux-2019.05.0.26416627.zip

Install the NVIDIA vGPU license Server

Unzip your license server archive:

[egallen@vgpu-license-server ~]$ unzip NVIDIA-ls-linux-2019.05.0.26416627.zip 
Archive:  NVIDIA-ls-linux-2019.05.0.26416627.zip
  inflating: grid-license-server-release-notes.pdf  
  inflating: grid-license-server-user-guide.pdf  
  inflating: grid-software-quick-start-guide.pdf  
  inflating: setup.bin

Add execute permission to the install binary:

[egallen@vgpu-license-server ~]$ chmod +x setup.bin
Preparing to install
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...

Launching installer...

===============================================================================
License Server                                   (created with InstallAnywhere)
-------------------------------------------------------------------------------

Preparing CONSOLE Mode Installation...




===============================================================================
Introduction
------------

InstallAnywhere will guide you through the installation of License Server.

It is strongly recommended that you quit all programs before continuing with 
this installation.

Respond to each prompt to proceed to the next step in the installation.  If 
you want to change something on a previous step, type 'back'.

You may cancel this installation at any time by typing 'quit'.

PRESS <ENTER> TO CONTINUE:
DO YOU ACCEPT THE TERMS OF THIS LICENSE AGREEMENT? (Y/N): Y
ENTER AN ABSOLUTE PATH, OR PRESS <ENTER> TO ACCEPT THE DEFAULT: /usr/local/nvidia\
INSTALL FOLDER IS: /usr/local/nvidia IS THIS CORRECT? (Y/N): Y
Enter local Tomcat server path: /usr/share/tomcat
ENTER A COMMA-SEPARATED LIST OF NUMBERS REPRESENTING THE DESIRED CHOICES, OR PRESS <ENTER> TO ACCEPT THE DEFAULT: 

===============================================================================
Pre-Installation Summary
------------------------

Please Review the Following Before Continuing:

Product Name:
    License Server

Install Folder:
    /usr/local/nvidia

Link Folder:
    /root/NVIDIA Corporation/License Server

Disk Space Information (for Installation Target): 
    Required:      199,627,035 Bytes
    Available: 211,963,322,368 Bytes

PRESS <ENTER> TO CONTINUE: 

===============================================================================
Installing...
-------------

 [==================|==================|==================|==================]
 [------------------|------------------|------------------|------------------]

Executing NVIDIA License Server Installation Script...
 
Starting NVIDIA License Server
 
Opening License Server Port 7070 in Firewall
 
Starting Tomcat Service
 


===============================================================================
Install Complete
----------------

License Server has been successfully installed to:

   /usr/local/nvidia

PRESS <ENTER> TO EXIT THE INSTALLER:

Your License server is available: http://vgpu-licence-server:8080/

Change the admin password:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh -server http://localhost:7070 -authorize admin Admin@123 -users -edit admin MyNewPassword
User authentication succeeded.
User [admin]'s password and/or roles have been edited successfully.

Export the password:

[egallen@vgpu-licence-server enterprise]$ export MY_NVIDIA_PASSWORD="MyNewPassword"

Check rights:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -users
User authentication succeeded.

=======================================================================================
Username            Roles
=======================================================================================
producer            ROLE_PRODUCER, ROLE_DROPCLIENT, ROLE_READ, ROLE_RESERVATIONS
admin               ROLE_ADMIN, ROLE_DROPCLIENT, ROLE_RESERVATIONS, ROLE_READ

Check the server status:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -status
User authentication succeeded.

Copyright (c) 2015-2018 Flexera LLC. All Rights Reserved.

(version) Version            : 2019.02
(buildVersion) Build Version : 244401

The server is in active state.

Server: http://localhost:7070/ active
Backup Server: Not configured

Check the default config:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -status
User authentication succeeded.
General License Server Information 
-----------------------------------
(license_server_url) IP                  : 127.0.0.1
(host_name) Host Name                    : localhost
(publisher_name) Publisher Name          : nvidia
(host_id) Binding ID                     : Not Configured (Not Configured)
(license_server_port) Port               : 7070
(licensing.backup.uri) Backup URI         : Not Configured

Licensing Policy Details:
-----------------------------------
(licensing.borrowIntervalMax) Borrow Interval Maximum            : NOT_CONFIGURED
(licensing.clientExpiryTimer) Client Expiry Timer Interval       : 2s
(licensing.hostIdValidationInterval) Host ID Validation Interval  : 2m
(licensing.allowVirtualClients) Allow Virtual Clients             : true
(licensing.allowVirtualServer) Allow Virtual Server               : true
(licensing.defaultBorrowGranularity) Default Borrow Granularity   : MINUTE
(licensing.borrowInterval) Default Borrow Interval                : NOT_CONFIGURED
(licensing.renewInterval) Default Renew Interval                  : 16
(licensing.registrationRequired) Registration Required            : 
(licensing.responseLifetime) Response Lifetime                    : 1d
(licensing.disableVirtualMachineCheck)
    Disable Virtual Machine Check                                 : false

Server Sync Settings :
-----------------------------------

Security Related Settings
-----------------------------------
(security.enabled) REST Security enabled                             : true

Log Settings :
-----------------------------------
(logging.threshold) Log level to record log messages            : ERROR
(logging.directory) Directory where logs will be stored         : /var/opt/flexnetls/nvidia/logs


Capability Related Details
-----------------------------------
-----------------------------------

Check the default license available:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses 
User authentication succeeded.

(license_server_url) License Server    : 127.0.0.1:7070

(no_of_features) Number of features    : 0

(no_of_client) Number of clients       : 0

Check the default features:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -features 
User authentication succeeded.

================================================================================
Name              Count           Version         Type              Expiration     
================================================================================


Total number of features : 0

Show licenses:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses -verbose
User authentication succeeded.
=======================================================================================
Feature ID      Feature Name           Feature Version   Feature Count Used/Available
=======================================================================================
=======================================================================================

Device Information:

-------------------------------------------------------------
Device Name                   Feature Registered(Used Count)
-------------------------------------------------------------
=======================================================================================

        Total feature count           : 0
        Total feature count used      : 0
        Total uncounted features      : 0
=======================================================================================

Registering the License Server and Getting License Files

Get you server mac adress:

[egallen@vgpu-licence-server enterprise]$ cat /sys/class/net/eth0/address | sed 's/://g' | sed 's/[a-z]/\U&/g'
525400C1XXXX

You can also find this unique id in your dashboard “Server host ID”:

Import the License Server file

Check license status before: Register

Click on “Map Add-Ons” and map 64 licenses: Register

Check license status after: Register

Check your license status:

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses -verbose
User authentication succeeded.
=======================================================================================
Feature ID      Feature Name           Feature Version   Feature Count Used/Available
=======================================================================================
1               Quadro-Virtual-DWS            5.0                  0/64
2               GRID-Virtual-Apps             3.0                  0/64
=======================================================================================

Device Information:

-------------------------------------------------------------
Device Name                   Feature Registered(Used Count)
-------------------------------------------------------------
=======================================================================================

        Total feature count           : 128
        Total feature count used      : 0
        Total uncounted features      : 0
=======================================================================================

Launch an instance on a RHOSP 15

(overcloud) [stack@accelab-director ~]$ openstack server create --flavor m1.small-gpu --image rhel8 --security-group web --nic net-id=internal0 --key-name lambda instance0
+-------------------------------------+-----------------------------------------------------+
| Field                               | Value                                               |
+-------------------------------------+-----------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                              |
| OS-EXT-AZ:availability_zone         |                                                     |
| OS-EXT-SRV-ATTR:host                | None                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None                                                |
| OS-EXT-SRV-ATTR:instance_name       |                                                     |
| OS-EXT-STS:power_state              | NOSTATE                                             |
| OS-EXT-STS:task_state               | scheduling                                          |
| OS-EXT-STS:vm_state                 | building                                            |
| OS-SRV-USG:launched_at              | None                                                |
| OS-SRV-USG:terminated_at            | None                                                |
| accessIPv4                          |                                                     |
| accessIPv6                          |                                                     |
| addresses                           |                                                     |
| adminPass                           | AN2aSRHL32eZ                                        |
| config_drive                        |                                                     |
| created                             | 2019-10-19T22:04:26Z                                |
| flavor                              | m1.small-gpu (dbcb3b87-3206-450f-927c-5d709ab48a21) |
| hostId                              |                                                     |
| id                                  | e543816d-1377-421b-a401-14e1108adfae                |
| image                               | rhel8 (42b4c71e-3994-4501-a281-35f12a1f4af4)        |
| key_name                            | lambda                                              |
| name                                | instance0                                           |
| progress                            | 0                                                   |
| project_id                          | 8998a44fcb9d4cb2aaaa1893e54f74f9                    |
| properties                          |                                                     |
| security_groups                     | name='6c36fcea-cc1e-4bc8-95bd-9f556f9d8327'         |
| status                              | BUILD                                               |
| updated                             | 2019-10-19T22:04:26Z                                |
| user_id                             | 54d15ffdd5384c229a11d9f0824a3763                    |
| volumes_attached                    |                                                     |
+-------------------------------------+-----------------------------------------------------+

Attach one floating IP:

(overcloud) [stack@accelab-director ~]$ FLOATING_IP_ID=$( openstack floating ip list -f value -c ID --status 'DOWN' | head -n 1 ) ; openstack server add floating ip instance0 $FLOATING_IP_ID

Check instance lanched:

(overcloud) [stack@accelab-director ~]$ openstack server list
+--------------------------------------+-----------+--------+----------------------------------------+-------+--------------+
| ID                                   | Name      | Status | Networks                               | Image | Flavor       |
+--------------------------------------+-----------+--------+----------------------------------------+-------+--------------+
| e543816d-1377-421b-a401-14e1108adfae | instance0 | ACTIVE | internal0=172.31.0.148, 192.168.168.46 | rhel8 | m1.small-gpu |
+--------------------------------------+-----------+--------+----------------------------------------+-------+--------------+

Instance status

Connect via ssh into the new VM:

(overcloud) [stack@accelab-director ~]$ ssh cloud-user@192.168.168.46
Activate the web console with: systemctl enable --now cockpit.socket

This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Sat Oct 19 18:27:20 2019 from 192.168.168.2
[cloud-user@instance0 ~]$

Check drivers:

[cloud-user@instance0 ~]$ nvidia-smi 
Sat Oct 19 18:31:40 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46       Driver Version: 430.46       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID V100-1Q        On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |     80MiB /  1014MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Check nvidia-gridd logs:

[cloud-user@instance0 ~]$ sudo cat /var/log/messages | grep nvidia-gridd
Oct 19 18:33:45 instance0 nvidia-gridd[12584]: License acquired successfully. (Info: http://10.10.10.220:7070/request; Quadro-Virtual-DWS,5.0)

Check nvidia-gridd service status:

[cloud-user@instance0 ~]$ sudo systemctl status nvidia-gridd
● nvidia-gridd.service - NVIDIA Grid Daemon
   Loaded: loaded (/usr/lib/systemd/system/nvidia-gridd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2019-10-19 18:33:40 EDT; 1min 36s ago
  Process: 12580 ExecStopPost=/bin/rm -rf /var/run/nvidia-gridd (code=exited, status=0/SUCCESS)
  Process: 12581 ExecStart=/usr/bin/nvidia-gridd (code=exited, status=0/SUCCESS)
 Main PID: 12584 (nvidia-gridd)
    Tasks: 4 (limit: 26213)
   Memory: 35.3M
   CGroup: /system.slice/nvidia-gridd.service
           └─12584 /usr/bin/nvidia-gridd

Oct 19 18:33:40 instance0 systemd[1]: Started NVIDIA Grid Daemon.
Oct 19 18:33:45 instance0 nvidia-gridd[12584]: License acquired successfully. (Info: http://10.10.10.220:7070/request; Quadro-Virtual-DWS,5.0)

Check the configuration used for the nvidia-gridd servive:

[cloud-user@instance0 ~]$ grep -v '^#' /etc/nvidia/gridd.conf | sed '/^$/d'
ServerAddress=10.10.10.220
ServerPort=7070
FeatureType=2
EnableUI=TRUE

Compute node status

Check the vGPU status on the RHOSP 15 KVM compute node

[heat-admin@overcloud-novacompute-0 ~]$ cat /etc/redhat-release ; cat /etc/rhosp-release 
Red Hat Enterprise Linux release 8.0 (Ootpa)
Red Hat OpenStack Platform release 15.0.0 Beta (Stein)

List running libvirt VMs:

[heat-admin@overcloud-novacompute-0 ~]$ sudo virsh list
 Id    Name                           State
----------------------------------------------------
 2     instance-0000000e              running

Check mdev associated to the VM:

[heat-admin@overcloud-novacompute-0 ~]$ sudo virsh dumpxml instance-0000000e | grep mdev
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>

Check the nvidia-smi vgpu status:

[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu 
Sat Oct 19 22:42:04 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46                 Driver Version: 430.46                    |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  Tesla V100-PCIE-16GB       | 00000000:3B:00.0             |   0%       |
+---------------------------------+------------------------------+------------+
|   1  Tesla V100-PCIE-16GB       | 00000000:D8:00.0             |   0%       |
|      3251634194  GRID V100-1Q   | e543...  instance-0000000e   |      0%    |
+---------------------------------+------------------------------+------------+

Check the nvidia-smi general status:

[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi 
Sat Oct 19 22:41:13 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46       Driver Version: 430.46       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                  Off |
| N/A   31C    P0    26W / 250W |     39MiB / 16383MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  On   | 00000000:D8:00.0 Off |                    0 |
| N/A   35C    P0    25W / 250W |   1060MiB / 16383MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1    316556    C+G   vgpu                                        1010MiB |
+-----------------------------------------------------------------------------+

We find the OpenStack id “e543816d-1377-421b-a401-14e1108adfae” and libvirt name:

[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu -q
GPU 00000000:3B:00.0
    Active vGPUs                    : 0

GPU 00000000:D8:00.0
    Active vGPUs                    : 1
    vGPU ID                         : 3251634194
        VM UUID                     : e543816d-1377-421b-a401-14e1108adfae
        VM Name                     : instance-0000000e
        vGPU Name                   : GRID V100-1Q
        vGPU Type                   : 105
        vGPU UUID                   : ec89f3f7-0351-47fd-9ec2-bb90753885d2
        Guest Driver Version        : 430.46
        License Status              : Licensed
        Accounting Mode             : Disabled
        ECC Mode                    : Enabled
        Accounting Buffer Size      : 4000
        Frame Rate Limit            : 60 FPS
        FB Memory Usage
            Total                   : 1024 MiB
            Used                    : 0 MiB
            Free                    : 1024 MiB
        Utilization
            Gpu                     : 0 %
            Memory                  : 0 %
            Encoder                 : 0 %
            Decoder                 : 0 %
        Encoder Stats
            Active Sessions         : 0
            Average FPS             : 0
            Average Latency         : 0
        FBC Stats
            Active Sessions         : 0
            Average FPS             : 0
            Average Latency         : 0

[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu -u
# GPU       vGPU    sm   mem   enc   dec
# Idx         Id     %     %     %     %
    0          -     -     -     -     -
    1 3251634194     0     0     0     0
    0          -     -     -     -     -
    1 3251634194     0     0     0     0
    0          -     -     -     -     -
    1 3251634194     0     0     0     0
    0          -     -     -     -     -
    1 3251634194     0     0     0     0
...

[heat-admin@overcloud-novacompute-0 ~]$ nvidia-smi vgpu -c
GPU 00000000:3B:00.0
    GRID V100-1Q   
    GRID V100-2Q   
    GRID V100-4Q   
    GRID V100-8Q   
    GRID V100-16Q  
    GRID V100-1A   
    GRID V100-2A   
    GRID V100-4A   
    GRID V100-8A   
    GRID V100-16A  
    GRID V100-1B   
    GRID V100-1B4  
    GRID V100-2B   
    GRID V100-2B4  
    GRID V100-4C   
    GRID V100-8C   
    GRID V100-16C  

GPU 00000000:D8:00.0
    GRID V100-1Q

Check the license status with CLI

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses
User authentication succeeded.

(license_server_url) License Server    : 127.0.0.1:7070

(no_of_features) Number of features    : 2

(no_of_client) Number of clients       : 1

[egallen@vgpu-licence-server enterprise]$ ./nvidialsadmin.sh  -server http://localhost:7070 -authorize admin $MY_NVIDIA_PASSWORD -licenses -verbose
User authentication succeeded.
=======================================================================================
Feature ID      Feature Name           Feature Version   Feature Count Used/Available
=======================================================================================
1               Quadro-Virtual-DWS            5.0                  1/63
2               GRID-Virtual-Apps             3.0                  0/64
=======================================================================================

Device Information:

-------------------------------------------------------------
Device Name                   Feature Registered(Used Count)
-------------------------------------------------------------
FA163ED6243E                        Quadro-Virtual-DWS(1)

=======================================================================================

        Total feature count           : 128
        Total feature count used      : 1
        Total uncounted features      : 0
=======================================================================================

Check the license status in the dashboard

License status before the boot:

License status before the boot

We launch the OpenStack instance:

(overcloud) [stack@accelab-director ~]$ openstack server create --flavor m1.small-gpu --image rhel8 --security-group web --nic net-id=internal0 --key-name lambda instance0

We cand find the instance started:

License status after the boot

Detail of the instance started:
Lincese detail

« OpenShift 4.2 on Red Hat OpenStack Platform 13 + GPU How OpenStack enables Face Recognition with GPUs and FPGAs »