Questions tagged [gpu]

The tag has no usage guidance.

Filter by
Sorted by
Tagged with
33 votes
7 answers
23k views

Does a server need a GPU?

Do I need a GPU on a text and console only server? No GPU as in no iGPU and dGPU. Im going to be using SSH, so I dont need a display out. Im using Linux, but the OS shouldn't affect the results
tymur999's user avatar
  • 495
5 votes
1 answer
105 views

The GPU usage provided by nvidia-smi command is very different from GPU metrics from guest OS

I'm working on a project that can monitor virtual machines' vgpu usage. The hypervisor is vCenter, we have nvidia A16 cards installed on vCenter hosts, and assigned a16 vGPU to a couple of windows VMs ...
zb2939's user avatar
  • 51
4 votes
1 answer
2k views

Kubernetes: How can I get which pod schedule GPU?

I have three Nvidia GPUs in my cluster, and so many pods are running in my cluster. How can I find which one of these pods schedule GPU and how many GPUs they schedule? I use this link to enable the ...
Nader's user avatar
  • 153
4 votes
1 answer
11k views

Use passthrough GPU in KVM/QEMMU and display in host OS in a window

I'm new to KVM/QEMMU. I have used virtual box to run Windows 10 in a virtual machine on my Arch host system (a laptop with both integrated and discrete GPUs). Being dissatisfied with the video ...
brett's user avatar
  • 141
4 votes
1 answer
1k views

What is the best metric for auto-scaling GPU instances for machine learning inference in the cloud?

We have an API in AWS with a GPU instance that does inference. We have an auto-scaler setup with the minimum and maximum number of instances, but aren’t sure which metric (GPU/CPU usage, RAM usage, ...
elwray14's user avatar
3 votes
2 answers
7k views

GCP does not have enough resources available to fulfill the request for about a month

I've been trying to start my existing GCP VM that has an NVIDIA T4 GPU attached to it, for almost a month at this time. It has been working fine before but now I am constantly getting the error ...
masus04's user avatar
  • 131
3 votes
1 answer
4k views

GPU Acceleration on a Windows Server without virtualization over RDP

I'm trying to find out if it's possible to run a Windows Server with one GPU which is shared between all RDP clients so that people could create a session on the server start some program with a UI ...
ridilculous's user avatar
2 votes
3 answers
3k views

Why are GPUs accessible from docker containers running on Linux hosts, but not on Windows or MacOS hosts?

Recent versions of docker (or any version of nvidia-docker) allow direct(?) access to the host GPU from within docker containers, with full access to CUDA APIs. This is very convenient when deploying ...
Will's user avatar
  • 229
2 votes
1 answer
5k views

Pod is stuck in PodInitializing status when an initContainer is OOMKilled

I have the following on-prem Kubernetes environment: OS: Red Hat Enterprise Linux release 8.6 (Ootpa) Kubernetes: 1.23.7 (single-node, build with kubeadm) NVIDIA driver: 515.65.01 nvidia-container-...
Daigo's user avatar
  • 373
2 votes
1 answer
263 views

What socket(s) does the aux power for a GPU come from in a PowerEdge T550?

Server: Dell PowerEdge T550 Tower Server PSU: Single, Hot Plug, Non-Redundant Power Supply (1+0), 1100W, Mixed Mode Titanium GPU: NVIDIA A40 Photos: https://www.reddit.com/user/bigboyserver/comments/...
bigboyserver's user avatar
2 votes
1 answer
2k views

Considerations using consumer class (high-end) GPU in server?

Motivation: First of all, even if I have some knowledge of computer science, software development and server Linux administration, I never looked into a server hardware and I am a total "newbie&...
Adrian Maire's user avatar
2 votes
0 answers
4k views

Slurm srun cannot allocate ressources for GPUs - Invalid generic resource specification

I am able to launch a job on a GPU server the traditional way (using CPU and MEM as consumables): ~ srun -c 1 --mem 1M -w serverGpu1 hostname serverGpu1 but trying to use the GPUs will give an error: ...
user324810's user avatar
2 votes
0 answers
1k views

Make Headless Server Use Hardware Accelaration

I have a headless CentOS 8 Server with an AMD GPU. I want to use hardware acceleration but when I run OpenGL programs with xvfb-run (e.g glxinfo), the system reports I am using software rendering. How ...
user avatar
2 votes
0 answers
2k views

Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 0.0 globally [duplicate]

I am trying to create the VM instance with NVIDIA K80 GPUs in Asia-East1 so, I requested to increase the quota and team have adjusted the quota. However, when I am trying to create the VM instance by ...
Tushar Shah's user avatar
1 vote
1 answer
2k views

Unable to use gpu in azure windows server 2016

I am trying to run a GPU intensive application(Lumion) on the Azure cloud. Image used - Windows Server 2016. Hardware - NV6_Promo with 1xK80 GPU. Any application, when launched, run without using ...
Nithin Jose's user avatar
1 vote
1 answer
301 views

How do you disable hardware-accelerated GPU scheduling via the command line in Windows 10/11?

I need to disable, programmatically, hardware-accelerated GPU scheduling in Windows if it's enabled. Searching, I was pointed at the HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers - HwSchMode ...
Jason Floyd's user avatar
  • 1,812
1 vote
1 answer
1k views

GCP: Cannot create any VM with GPU -> No capacity

I subscribed to GCP and received the $300 credits. Then I upgraded my account to "paid account". Next, I increased the limit for multiple VM types with GPU, in multiple regions, and received ...
cloud_IaaS's user avatar
1 vote
1 answer
297 views

Why can't the GPUs communicate in a multi-GPU server?

This is a Dell PowerEdge r750xa server with 4 Nvidia A40 GPUs, intended for AI applications. While the GPUs work well individually, multi-GPU training jobs or indeed any multi-GPU computational ...
isarandi's user avatar
  • 341
1 vote
1 answer
55 views

always available gpu servers

I need several GPU servers for rendering. Region doesn`t matter. Preemptible - I think it is ok for me. It is important that at any time I should be able to run a couple of instances with the GPU. ...
Sergey Kozlov's user avatar
1 vote
0 answers
40 views

Mount a remote GPU locally on Rocky Linux

I am doing some work on a server which has 1TB RAM, 100CPUs, and 1000TB storage. My work involves very large datasets. One specific I am trying to run would benefit immensely from a GPU, it is ML ...
donkey's user avatar
  • 63
1 vote
0 answers
191 views

Hyper-V GPU Passthrough with NVidia A100 no display

I am currently trying to get some NVidia A100 GPUs to work on our Hyper-V Hypervisor. I managed to setup the GPU Passthrough to a VM but the problem is I don't get a video display. I assume the ...
C0dR's user avatar
  • 161
1 vote
0 answers
2k views

GKE Node auto-provisioning not scaling up with limits defined

I want to use GKE node auto-provisioning to create a node-pool with GPU on demand (that is when I start a Job that needs GPU resources). Going with the GCP tutorial I've set up a cluster with enabled ...
przemys's user avatar
  • 11
1 vote
1 answer
2k views

Xorg not starting in GKE with GPU : (EE) no screens found(EE)

I am trying to run Xorg server that use GPU inside Google Kubernetes Engine I followed this guide (https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#ubuntu) to setup a GKE cluster with ...
krish211's user avatar
1 vote
0 answers
366 views

Service Creation failed AWS ECS Service with elastic inference accelerator (EIA)

I am trying to create a service in my ECS cluster to run a task that uses elastic inference accelerator (EIA). However, when I try to create the service, I get the following error: I have read the ...
toing_toing's user avatar
1 vote
0 answers
147 views

Mounting virtual GPUs on different machines

I have multiple computers connected over a VPN, each computer having one or more Nvidia GPUs. All computers run Linux. I'm wondering if it's possible to mount them over network such that they appear ...
Susmit Agrawal's user avatar
1 vote
1 answer
2k views

Misbehaving NVLINK with 2080 ti cards?

I am running into problems with nvlink'd RTX videocards, and I wonder if someone more experienced with this tech could kindly look at the output below and tell me if there is a problem? Using a pair ...
user avatar
1 vote
0 answers
113 views

How Do You Run GPU Task on Windows Server 2016 Remotely?

From my understanding, remotely executing GPU tasks is a little tricky because of the way sessions work on Windows. Session 0 does not have access to GPU drivers, and usually when you remotely execute ...
Syed Aman's user avatar
1 vote
0 answers
179 views

DDA device out of resource(Error 12) in Hyper-V VM

We're 'borrowing' a server from an IDC that had two Tesla V100 on it. It runs Windows Server 2016. Since it has some impressive graphics capabilities, we were looking for ways to run graphic ...
iCore's user avatar
  • 11
0 votes
1 answer
1k views

Compatible AMD GPU for the Dell PowerEdge r710 [closed]

I'm a complete noob when it comes to server virtualization and GPUs, so please bear with me. Are there any decent AMD GPUs that are easily compatible with the Dell PowerEdge r710? I'm looking for ...
GNULinuxOnboard's user avatar
0 votes
1 answer
255 views

Dell PowerEdge R720XD - Fans Ramp Up When Video Card is Installed [closed]

I have a Dell PowerEdge R720XD with 2x Xeon E5-2640 v2 (16 cores total), 16 GB of RAM, and 2x 3 TB hard drives. I have Windows 10 running on the machine. When I connect a video card to the system, the ...
Rowan McJimsey's user avatar
0 votes
1 answer
98 views

[GCP]Why can't I see the GPU selection button on my VM instance?

I want to use GPU in Windows VM. However, the GPU selection button is not active when creating the instance. Currently, I applied for GPU(k-80) as a quota and got 1 allocation. Of course, I received ...
Jay Kim's user avatar
0 votes
1 answer
114 views

Resolved - Only 1 nodes out of 2 have registered on my node-pool if I have a GPU activated on my cluster

I have a managed k8S cluster with 1GPU (Tesla K80) activated in west1-b and west1-d (Each zone has this GPU model enabled, and my quota is ok). Each time that I create a node pool with 2 nodes only ...
Baptiste GAILLET's user avatar
0 votes
0 answers
77 views

How can I enable OpenGL on Windows Server 2022?

I am trying to use an application requiring OpenGL > 2.1 on Windows Server. I have read on multiple questions here about the challenges of getting OpenGL to work through a remote desktop session, ...
Maxime's user avatar
  • 101
0 votes
0 answers
42 views

Openstack: Can't launch instances when GPU passtrough is enabled

I installed openstack using devstack in a single node configuration on Ubuntu 22.04 (Jammy) LTS. I followed the following tutorial to setup GPU passthrough on my openstack: https://superuser.openinfra....
dmat_pravaig's user avatar
0 votes
0 answers
57 views

Nvidia-smi missing GPU with SR-IOV disable in bios

I have a HPE serveur (DL385) with 3 Nvidia A100 in it. It is running ubuntu 22.04 with kernel version 6 (I have tried with the 5). By default, it was in energy saving mode in the BIOS. When I tried to ...
Guillaume Lechantre's user avatar
0 votes
0 answers
148 views

DirectX 12 is not supported on your system on VM with Nvidia L4

I have a virtual machine at google cloud with the Nvidia L4 GPU to run 3D applications. I installed the Nvidia drivers according to the documentation, but every time I run an application, it doesn't ...
Canal HardCorePlay's user avatar
0 votes
0 answers
29 views

Building server for research team

this is kind of a big task for me, I have an engineering background with computer science knowledge but I'm no expert in servers and networking. I want to build a GPU-enabled computing platform that ...
Frmrz's user avatar
  • 1
0 votes
0 answers
17 views

Will compute nodes with A100 80GB (2x on node1) and A100 40GB (2x on node2) work in Red Hat OpenShift cluster?

I think the answer should be yes, however these parts/cards are expensive, so would like to know from experts who have done this kind of things. Will MIG be supported on this?
techele's user avatar
0 votes
0 answers
88 views

Interpretation of output of nvidia-smi and lspci | grep -i nvidia

I am very new to GPU servers. I submitted a slurm job and then checked "nvidia-smi". I got the following outputs. This picture Then, I ran "lspci | grep -i nvidia" where I got ...
Jaichand Patel's user avatar
0 votes
0 answers
48 views

AMD driver doesn't load after rebooting vm with GPU passthrough on virt manager

I'm having an issue with GPU passthrough, using virt manager on Debian 11. It's a very specific question, but i hope that someone can help me with this. I have an RX 5500XT GPU, and i passthrough it ...
Joabeslopes's user avatar
0 votes
0 answers
40 views

How to size compute/gpu/storage/network for generative AI or LLM? [duplicate]

I would like to provision compute (servers), gpus (say 2 A100 80GB or H100), storage and network (may be 100GbE) to run OpenApaca 7B (https://huggingface.co/openlm-research/open_llama_7b) model. How ...
techele's user avatar
0 votes
0 answers
1k views

How do I fix amdgpu and amdgpu-dkms packages not installing?

I'm trying to install the amdgpu package and it throws a bunch of errors: Reading package lists... Done Building dependency tree... Done Reading state information... Done The following packages were ...
Mach's user avatar
  • 1
0 votes
1 answer
453 views

LM-Sensors - Intel Arc A750 + Dell R720 + Debian 11 (Kernel 6.2.2)

I have a Dell R720 at home running virtualized services under KVM. I have recently added an Intel ARC A750 which I've passed through a Debian 11 VM (with a Q35 machine model), where I have Jellyfin ...
KiralyCraft's user avatar
0 votes
0 answers
375 views

How to monitor windows GPU in zabbix

I am using zabbix 6.0 and I wan to monitor the gpu usage and temperature and other things related to GPU of my windows host, How can I do that, I have windows 10 and agent installed
biplab 's user avatar
0 votes
0 answers
89 views

how so I troubleshoot intermittent node/kubelt reboots on a GKE

I am running workloads on a spot GPU node pool & intermittently getting 'NodeNotReady' followed by a reboot/restart of the node (& loss of the the workload pod), however the node does not go ...
Rupert Lloyd's user avatar
0 votes
1 answer
429 views

Can each GPU be used on Kubernetes as dedicated to a specific Pod?

I have the following environment: Pods: Pod0, Pod1 (launched as a k8s Job) GPUs: GPU0, GPU1 GPU0 is dedicated to Pod0, and GPU1 is dedicated to Pod1. There can be multiple Pod0s and Pod1s at the same ...
Daigo's user avatar
  • 373
0 votes
0 answers
178 views

CloudStack and GPU support

Is there any document for GPU and VGPU support in cloudstack version 4.14 and later? For example how I find out CloudStack support Nvidia Quadro RTX series or not?
for1401's user avatar
0 votes
1 answer
298 views

How to identify slot for faulty GPU card in server using UBUNTU OS commands?

I have a question. Is it possible to identify in which slot there is a broken GPU card using the UBUNTU operating system? We have a SuperMicro GPU server in which there are about 8 GPU cards for AI ...
Herman's user avatar
  • 1
0 votes
0 answers
178 views

Is there any software VGA dummy solution?

To cut long story short, I actually run Davinci Resolve on a Intel HD 4600 based server on Hetzner auction. And they all don't come with any remote KVM, or use any VGA dummy plug, for every server ...
Rinaldo Jonathan's user avatar
0 votes
0 answers
28 views

Server not starting, after adding graphisc card

First of all, I want to clarify, that I pretty new to server topics. My server runs on normal ubuntu. I bought a Nvidia Quadro k4000. I built it into the server and connected the PCIe Power cable. - ...
Ninto 1's user avatar