Highest scored 'nvidia' questions

14 votes

1 answer

3k views

What are actual Tesla M60 models used by AWS?

Wikipedia says that the Tesla M60 has 2x8 GB RAM (whatever it means) and TDP 225–300 W. I use an EC2 instance (g3s.xlarge) which is supposed to have a Tesla M60. But nvidia-smi command says it has ...

hans

242

asked Mar 12, 2019 at 0:26

7 votes

1 answer

6k views

Google Kubernetes Engine node pool does not autoscale from 0 nodes

I am trying to run a machine learning job on GKE, and need to use a GPU. I created a node pool with Tesla K80, as described in this walkthrough. I set the minimum node size to 0, and hoped that the ...

anna_hope

173

asked Apr 9, 2019 at 16:23

5 votes

1 answer

8k views

Why is my CUDA GPU-Util ~70% when there are "No running processes found"?

After configuring a system with 2 Tesla K80 cards, I noticed when running nvidia-smi that one of the 4 GPUs was under heavy load despite there being "No running processes found". Why is this happening ...

Steven C. Howell

671

asked Sep 26, 2016 at 18:56

5 votes

1 answer

105 views

The GPU usage provided by nvidia-smi command is very different from GPU metrics from guest OS

I'm working on a project that can monitor virtual machines' vgpu usage. The hypervisor is vCenter, we have nvidia A16 cards installed on vCenter hosts, and assigned a16 vGPU to a couple of windows VMs ...

zb2939

51

asked Aug 31 at 16:12

4 votes

2 answers

2k views

8 GPU machine freezes

We have a SuperMicro GPU server with: 2x Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz 512GB memory more than enough disk space X10DRG-O+-CPU (BIOS Version : 2.0a [current]) X9DRG-O-PCIE PCI-E expander ...

pks

41

asked Feb 8, 2017 at 11:51

4 votes

0 answers

212 views

Erase GPU memory

We have Nvidia GPU cards that can be used by different users in an OpenStack environment. A first user creates a VM with access to a GPU card, then deletes the VM when done. Another user then creates ...

J. Chorin

41

asked Aug 8, 2018 at 15:07

3 votes

1 answer

341 views

Dell PowerEdge R7525 + Nvidia A16

We have a PowerEdge R7525 server with nvidia A16 graphics card on debian 11. But we have about 50% lower gpu performance than other servers. I suspect it's the missing "Above 4G decoding" ...

Aotor

31

asked Aug 28 at 10:22

3 votes

2 answers

14k views

NVIDIA-SMI can't communicate with NVIDIA driver

Problem description I am trying to set up a centos-7 GPU (Nvidia Tesla K80) instance on Google Cloud, to execute CUDA work. Unfortunately, I can't seem to properly install/configure drivers. Indeed,...

Elouan Keryell-Even

493

asked Dec 4, 2018 at 15:36

2 votes

2 answers

3k views

Install Display Card In ProLiant DL580 Gen8 Server

We have a ProLiant DL580 Gen8 Server and want to install Gigabyte GForce GTX 980 ti Display Card in PCIE slot, When we connect 8 pins sockets power, server could not turn on, and when power socket not ...

MTSS

123

asked Apr 26, 2016 at 4:16

2 votes

4 answers

5k views

Nvidia driver breaks vncserver on CentOS 7.4, is there a work around?

CentOS Linux release 7.4.1708 (Core) uname -r output: 3.10.0-693.2.2.el7.x86_64 NVidia driver: NVIDIA-Linux-x86_64-375.66.run When using the Nvidia graphics card driver with the Nvidia GeForce GT ...

Edward_178118

965

asked Oct 15, 2017 at 9:23

2 votes

1 answer

5k views

Pod is stuck in PodInitializing status when an initContainer is OOMKilled

I have the following on-prem Kubernetes environment: OS: Red Hat Enterprise Linux release 8.6 (Ootpa) Kubernetes: 1.23.7 (single-node, build with kubeadm) NVIDIA driver: 515.65.01 nvidia-container-...

Daigo

373

asked Aug 30, 2022 at 3:06

2 votes

2 answers

4k views

Alternative to nvidia-settings GpuPowerMizerMode in Ubuntu?

We have a Ubuntu 20.04 server with Nvidia GPUs and want to change the Power Mode / GpuPowerMizerMode to Prefer Maximum Performance. One way to do this is nvidia-settings -a "[gpu:0]/...

Rug Olgebort

21

asked Feb 25, 2021 at 14:04

2 votes

1 answer

647 views

Installing NVIDIA Drivers for Diskless Environment

I'm trying to set up a cluster of 8 computers plus a main file server. Ideally, I'd like to set this up in a pxe-boot, quasi-diskless/quasi-stateless environment (i.e. the only local storage is /var, ...

Travis DePrato

70

asked Jan 15, 2017 at 23:20

2 votes

1 answer

263 views

What socket(s) does the aux power for a GPU come from in a PowerEdge T550?

Server: Dell PowerEdge T550 Tower Server PSU: Single, Hot Plug, Non-Redundant Power Supply (1+0), 1100W, Mixed Mode Titanium GPU: NVIDIA A40 Photos: https://www.reddit.com/user/bigboyserver/comments/...

bigboyserver

23

asked Jan 23 at 18:35

2 votes

1 answer

140 views

Access Denied on NVIDIA GRID 7.2 Driver

I am trying to set up an NVIDIA Tesla T4 GPU and use its RTX functionality in a raytracing application (Bakery for Unity3D). But every time I launch the app, Bakery tells me it could not find the ...

omacha

63

asked Apr 2, 2019 at 12:01

2 votes

1 answer

6k views

Failed to initialize NVML: Unknown Error - Not able to complete NVIDIA Tesla P100 Grid Setup on the vSphere Host Server with Vmware ESXI 6.7

I am unable to setup the NVIDIA Tesla P100 Grid Setup on the vSphere Host Server with Vmware ESXI 6.7 on DELL EMC poweredge R740. When I am trying to run nvidia-smi command I am getting following ...

Sarath Zacharia

31

asked Mar 8, 2019 at 10:08

2 votes

1 answer

1k views

Executing Cuda script in LXC container results in "cuda error: no CUDA-capable device is detected"

I followed the following instructions in order to set up Cuda inside an LXC container. When I try to execute the sample ./deviceQuery script inside the container following error is returned: $ ./...

Greg

1,657

asked Dec 22, 2015 at 14:50

2 votes

0 answers

75 views

NVIDIA Grid / Gaming drivers licensing issues AWS EC2

I'm following https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/install-nvidia-driver.html#nvidia-gaming-driver in order to install NVIDIA Gaming drivers to unlock higher resolutions on AWS EC2 ...

Tommy B.

1,423

asked Sep 1 at 2:57

2 votes

0 answers

1k views

GCP VM: nvidia-container-cli: initialization error: driver error: timed out: unknown

Lately my GCP VM of multiple GPUs throws the following error when I try to run my container: docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container ...

ben0it8

121

asked Jan 19, 2021 at 15:33

2 votes

0 answers

414 views

"Getting devices ready" on Windows 10 while booting VM/iSCSI on another machine than initially set up

TL;DR version: virtual Windows instance reinstalls GPU drivers while switching to other hosts despite the fact it's getting the same hardware all the time. I'm trying to avoid it / shorten its time ...

Domel

21

asked Nov 22, 2019 at 13:42

2 votes

0 answers

992 views

nvidia-smi must be run by root before it can be used by regular users

On a newly built Ubuntu 16.04 machine, running nvidia-smi fails as a regular user $ nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest ...

hanxue

1,377

asked Jul 19, 2019 at 2:16

2 votes

0 answers

32 views

Specify a GPU to use at launch

I am currently working with an Azure GPU VM (NV6 using M60 Nvidia Graphic card) I'm doing my benchmark on this VM without any issue for the moment. Now I'm doing the same benchmark on a NV12 which has ...

Turgal

121

asked Feb 5, 2019 at 14:41

2 votes

0 answers

2k views

libGL error: dlopen /usr/lib64/dri/nouveau_dri.so failed on CentOS 6.6 [closed]

I'm having problems using the nouveau driver for my Nvidia GeForce 9100. Xorg starts up and works fine, I am able to use everything, although in /var/log/Xorg.0.log I have: $ cat /var/log/Xorg.0.log ...

Leo

121

asked Jun 17, 2015 at 16:16

1 vote

2 answers

2k views

Ganglia's GPU Nvidia module: do we need to patch the ganglia-webfrontend?

I am trying to add the GPU Nvidia module in ganglia (/ganglia/gmond_python_modules/gpu/nvidia/). Do we need to apply the ganglia_web.patch patch? If I do not apply the patch, I don't see any GPU ...

Franck Dernoncourt

1,072

asked Apr 21, 2016 at 3:56

1 vote

1 answer

2k views

Unable to use gpu in azure windows server 2016

I am trying to run a GPU intensive application(Lumion) on the Azure cloud. Image used - Windows Server 2016. Hardware - NV6_Promo with 1xK80 GPU. Any application, when launched, run without using ...

Nithin Jose

149

asked May 11, 2019 at 16:29

1 vote

1 answer

2k views

apt-get bricked by nvidia drivers

I was updating my machine while some drivers crashed. After the reboot my X server was broken and I have reinstalled it. Now apt-get is stuck with this error: ╭─phra at kali in /home/phra ╰─λ sudo ...

phra

41

asked Oct 17, 2017 at 14:27

1 vote

1 answer

369 views

GLX is compiled with wrong version (Display resolution and hardware acceleration stopped working)

I have two Ubuntu 14 Desktops (identical). Both were working fine until Friday. Some updates appear to have been performed on the non-working machine... 2019-03-18 02:29:32 install linux-base:all &...

BurningKrome

535

asked Mar 18, 2019 at 10:56

1 vote

1 answer

1k views

Ubuntu server 20.04 LTS - Installing nvidia & cuda installs gnome as well

I have a GPU server which requires cuda for example for machine learning tasks. unfortunately, as soon as I install the NVIDIA drivers and cuda, apparently a variant of gnome is installed as well. ...

Julian Bechtold

123

asked Sep 10, 2021 at 17:19

1 vote

1 answer

477 views

GPU server freezes during GPU idling

We have a new Supermicro Server AS-4124GS-TNR equipped with eight NVIDIA RTX A6000. The OS is Ubuntu 20.04.2, the NVIDIA driver version is 460.73.01 (no Nouveau driver used), the CUDA Version is 11.2. ...

user776206

13

asked Jul 14, 2021 at 7:39

1 vote

1 answer

111 views

Is the Pod Resources API disabled on Google Kubernetes Engine?

Problem Summary: We're using DCGM Exporter to collect metrics about GPU workloads. When deployed on GKE, the exporter does not return GPU information about other pods or containers (when it's expected ...

Ash

121

asked May 5, 2021 at 17:31

1 vote

1 answer

575 views

GKE can't schedule newly created pods that demand GPU on newly added nodes with GPUs

When adding new pool nodes with GPU Google Kubernetes Engine can't schedule newly created pods that demand GPU on these new nodes, should be automatic but not for GPU resources I guess, new pods stays ...

Elras

21

asked Jul 17, 2020 at 8:19

1 vote

1 answer

198 views

Google Cloud - Monitor running on Microsoft Display Driver instead of NVIDIA K80 GPU

My Google Cloud Instance is running on Microsoft Display Driver,instead of the GPU.I tried to install Hyper-V,but Google Cloud Processors don't support it.Please help,i need to run Unity,but can't ...

Mirkual Sen

11

asked Oct 6, 2018 at 8:32

1 vote

1 answer

111 views

nvidia driver not present on debian bullseye after installing cuda

I'm trying to get nvidia gpu drivers and related software installed / upgrades on a debian bullseye system and having trouble. I tried following the instructions for installing cuda, but when I get ...

Gary Aitken

137

asked Sep 30 at 21:54

1 vote

0 answers

191 views

Hyper-V GPU Passthrough with NVidia A100 no display

I am currently trying to get some NVidia A100 GPUs to work on our Hyper-V Hypervisor. I managed to setup the GPU Passthrough to a VM but the problem is I don't get a video display. I assume the ...

C0dR

161

asked Aug 23 at 7:14

1 vote

1 answer

351 views

Does a defunct process still allocate resources in the system?

I have a production machine (Ubuntu 18.04) that runs processes in GPU using Nvidia. A certain process has allocated memory and is now defunct, leaving the GPUs basically unusable. ps -o ppid= -p ...

Marco Montevechi Filho

13

asked Apr 18 at 15:33

1 vote

0 answers

96 views

Linux: cuda (pytorch) does not allocate available vram

I am trying out pixray/clipit but cuda fails to allocate the remaining 1GiB of my graphics card. My graphics card is "Nvidia GTX 1660 super" which has the same amount of RAM as the "...

france1

23

asked Aug 27, 2022 at 7:30

1 vote

1 answer

2k views

Xorg not starting in GKE with GPU : (EE) no screens found(EE)

I am trying to run Xorg server that use GPU inside Google Kubernetes Engine I followed this guide (https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#ubuntu) to setup a GKE cluster with ...

krish211

11

asked May 26, 2021 at 12:22

1 vote

1 answer

546 views

slurm nvidia-docker ignores CUDA_VISIBLE_DEVICES

I have a problem running nvidia-docker containers on a slurm cluster. When inside the container all gpus are visible so basically it ignores the CUDA_VISIBLE_DEVICES set env by slurm. Outside the ...

JohnA.Zoidberg

13

asked Mar 21, 2021 at 18:26

1 vote

1 answer

2k views

Misbehaving NVLINK with 2080 ti cards?

I am running into problems with nvlink'd RTX videocards, and I wonder if someone more experienced with this tech could kindly look at the output below and tell me if there is a problem? Using a pair ...

anon

asked May 8, 2020 at 14:34

1 vote

1 answer

182 views

Virtualisation primary GPU

The Server is running on Proxmox VE. My goal is to use any GPU in a VM. So I blacklisted nvidia noveau radeon amdgpu to ensure all GPUs are correctly accessible to assign the VFIO driver. I've added ...

J Mustermann

11

asked Oct 22, 2019 at 9:21

1 vote

0 answers

414 views

hypervisor.cpuid.v0 or hidden state='on' equivlent in hyper-v

I'd like to hide to a vm, that it's being virtualized on hyper-v. I've done: ExposeVirtualizationExtensions : True But it doesn't seem to have the same effect. The goal is to pass a nvidia geforce ...

Ryan Lewkowicz

61

asked Jun 19, 2019 at 12:41

1 vote

1 answer

1k views

ESXi Tesla passthrough enabled but not assignable

I am facing an issue with an ESXi ( 6.7.0 Update 1 ) and the passthrough of a GPU card (NVIDIA Tesla P4). The GPU card is listed in the "Passthrough capable" PCI Devices section as "Enabled / Needs ...

JohnLoopM

161

asked Jan 22, 2019 at 8:33

1 vote

0 answers

388 views

Simultaneous usage of Nvidia and AMD GPUs

I have a server which hosts three different GPU platforms: Onboard GPU, Nvidia and AMD GPUs. I have not installed X server, as I do not intend to bring the desktop up. I always use ssh and use the ...

Arya Mz

111

asked Oct 19, 2018 at 15:58

1 vote

1 answer

219 views

Nvidia Pascal architecture: DMA Size / maximum amount of host system RAM?

We are planning to build a pair of multi-GPU Linux servers for machine learning and data science tasks. Per our requirements, we need to put a lot of RAM in these machines; we're planning on 24x 64GiB ...

mvoelske

111

asked Jul 12, 2016 at 10:48

1 vote

0 answers

278 views

nvidia driver displaying odd bios,uuid under Grid K2

I have a number of servers that have GRID K2 nvidia Tesla cards in. Initially these were working fine. But I recently upgraded the kernel driver and have found a problem where CUDA based apps were ...

hookenz

14.5k

asked Jun 15, 2015 at 3:34

0 votes

1 answer

749 views

Reverting yum update

I needed to update NVidia driver on a CentOS 6.9 and decided to update a bit more. So I did sudo yum update and rebooted. Unfortunately that caused problems with NVidia that were worse than before. I ...

Michael

1,723

asked May 17, 2017 at 0:04

0 votes

1 answer

1k views

Containerd failed to start after Nvidia Config

I've follow this official tutorial to allow a bare-metal k8s cluster to have GPU Access. However i received errors while doing so. Kubernetes 1.21 containerd 1.4.11 and Ubuntu 20.04.3 LTS (GNU/Linux 5....

XPLOT1ON

107

asked Dec 1, 2021 at 11:37

0 votes

1 answer

11k views

CentOS 7 w/Gnome hangs on boot after Nvidia driver installation?

there is a lot of information available on these topics separately, but I haven't been able to find an answer to what I feel is a really common situation. I have 2 Nvidia GTX 1080s in a server with ...

Locane

429

asked Sep 22, 2016 at 21:49

0 votes

1 answer

707 views

Cannot view bios/boot screen on Dell T7500 with Quadro 4000 card

I have a Dell T7500 with a Quadro 4000 card. I have just attached a new Phillips 328E1CA via display port. The new monitor only has display port and HDMI inputs. The monitor specs are here: https://...

Ginger

103

asked Jul 15, 2020 at 19:19

0 votes

1 answer

275 views

How can I find out if my Azure VM is running on DGX-1?

I am trying to reset the GPU of my Azure virtual machine (NVIDIA GPU Cloud Image running on Standard NV6 running Ubuntu 16.04.1) to get reproducible results on a deep learning algorithm. I found this ...

miguelmorin

249

asked Feb 15, 2019 at 11:53

Questions tagged [nvidia]

Related Tags