I am trying to install my new Tesla M6 GPU card in my ESXi 6.0 host. I installed the nvidia software on the host and that looks ok. I can add the GPU card to my VM in vcenter as well BUT when I try to turn the VM on, I get the following error:
Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'grid_m6-8q'
I tried every other GPU profile and I get the same error for each GPU profile. I updated my ESXi 6.0 host to UP2 and that didn't seem to have any effect. I am running host driver version 352.83.
I have also checked the compatability list for this tesla M6 card and my cisco b200 M4 is listed there as compatable.
For troubleshooting, I have ran the following commands:
[root@VH1:~] nvidia-smi
Tue Apr 5 21:47:42 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.83 Driver Version: 352.83 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M6 On | 0000:81:00.0 Off | 0 |
| N/A 47C P8 16W / 100W | 30MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Looks good to me?
I have also ran:
[root@VH1:~] dmesg | grep -E "NVRM|nvidia"
2016-04-04T17:05:16.417Z cpu3:33421)Loading module nvidia ...
2016-04-04T17:05:16.423Z cpu3:33421)Elf: 1865: module nvidia has license NVIDIA
NVRM: vmk_MemPoolCreate passed for 4194304 pages.
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 352.83 Sun Feb 7 20:16:36 PST 2016
2016-04-04T17:05:16.754Z cpu3:33421)Device: 191: Registered driver 'nvidia' from 20
2016-04-04T17:05:16.754Z cpu3:33421)Mod: 4943: Initialization of nvidia succeeded with module ID 20.
2016-04-04T17:05:16.754Z cpu3:33421)nvidia loaded successfully.
2016-04-04T17:05:17.553Z cpu29:33420)Device: 326: Found driver nvidia for device 0x36554304c6e6377b
NVRM: nvidia_associate vmgfx0
2016-04-04T17:08:15.323Z cpu2:35277)IntrCookie: 1915: cookie 0x3d moduleID 20 <nvidia> exclusive, flags 0x1d
Any other ideas on this? Or anything else to try?
Thanks in advance