I have an environment running SQL Server on a Windows VMWare platform using a SAN with SSDs set up in RAID 6, and using Veeam for server backups and LiteSpeed for SQL Server backups.
I've had a problem several times over the past year where sometimes the database slows to a crawl, and my Avg. Disk Queue Length is high, but my Disk Bytes/sec is much lower than it should be able to go.
Here's the Performance Monitor on the database server. When this problem happens, the Avg. Disk Queue Length is always in the range of several hundred, and the Disk Bytes/sec stays around 5-15 MB/sec. During normal operation (when this problem isn't happening), Disk Bytes/sec goes as high a 900 MB/sec or so.
In the time since this problem started happening, I have replaced the SAN hardware -- including the switches. But the problem continues on the new hardware.
My theory has been that this isn't a SQL Server problem, because if the problem was that SQL Server was saturating the disk I/O, I should see much higher Disk Bytes/sec. But whenever this problem happens, Disk Bytes/sec is always very low.
I thought maybe it was the backup software -- either running on the database server or running on another server that's making use of the same VMWare/SAN -- but neither the server backups nor the SQL Server backups seem to be running while this problem is happening.
My last thought is this is a problem with VMWare, but I've contacted them and so far they haven't been able to help.
Rebooting the database server fixes the problem. Sometimes the problem will happen again within a day, and sometimes the problem doesn't happen again for months. Whenever the problem happens, I'm not aware of anything outside of the normal workload running on the database.
What could be causing this problem where the disk throughput slows to around 1% of what it should be capable of?