Bus-Id: GPU’s PCI bus id as “domain:bus:device.function”, in hex format which is used to filter out the stats of a particular device.
Disp.A: Display Active is a flag that decides if you want to allocate memory on a GPU device for display i.e. to initialize the display on GPU. Here, “Off” indicates that there isn’t any display using a GPU device.
Memory-Usage: Denotes the memory allocation on GPU out of total memory. Tensorflow or Keras(TensorFlow backend) automatically allocates whole memory when getting launched, even though it doesn’t require it.
Volatile Uncorr. ECC: ECC stands for Error Correction Code which verifies data transmission by locating and correcting transmission errors. NVIDIA GPUs provide an error count of ECC errors. Here, the Volatile error counter detects the error count since the last driver loaded.
GPU-Util: It indicates the percent of GPU utilization i.e. percent of the time when kernels were using GPU over the sample period.
Compute M.: Compute Mode of specific GPU refers to the shared access mode where compute mode sets to default after each reboot. The “Default” value allows multiple clients to access the CPU at the same time.
GPU: Indicates the GPU index, beneficial for multi-GPU setup. This determines which process is utilizing which GPU. This index represents the NVML Index of the device.
PID: Refers to the process by its ID using GPU.
Type: Refers to the type of processes such as “C” (Compute), “G” (Graphics), and “C+G” (Compute and Graphics context).
Process Name: Self-explanatory
GPU Memory Usage: Memory of specific GPU utilized by each process.
Other metrics and detailed descriptions are stated on Nvidia-smi manual page.