nvidia driver displaying odd bios,uuid under Grid K2

nvidia driver displaying odd bios,uuid under Grid K2

I have a number of servers that have GRID K2 nvidia Tesla cards in.

Initially these were working fine. But I recently upgraded the kernel driver and have found a problem where CUDA based apps were no longer detecting GPU's being present.

On closer inspection details from /proc/drivers/nvidia/gpus/*/information Are no longer giving valid GPU UUID & Video BIOS detail. Instead I'm getting the following. While on a working node I get normal detail (no, ?'s).

Bus Location:    0000:89:00.0
Model:           GRID K2
IRQ:             46
GPU UUID:        GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:        PCIe
DMA Size:        37 bits
DMA Mask:        0x1fffffffff
Bus Location:    0000:8a:00.0

I have tried cold rebooting the machines to the previous known configuration working version (these servers are netbooted) and the problem is also persisting with the old drivers.

What could be going wrong here? Are the cards toast?

相关内容