4

I have been trying to track down why my backups have been slow using ghettoVCB from the ESXI host.

I'm currently backing up my virtuals using ghettoVCB to a NFS share on TrueNAS from the host OS.

When I copy files to the NFS share from a guest (Ubuntu 20.04) on the ESXI machine I get about 257MB/s (which is about right as I have a dedicated 2.5gb channel between the NAS and ESXI)

su@test:/mnt/guest$ time sh -c "dd if=/dev/zero of=test bs=1MB count=1024 && sync"
1024+0 records in
1024+0 records out
1024000000 bytes (1.0 GB, 977 MiB) copied, 3.98153 s, 257 MB/s

real    0m4.470s
user    0m0.002s
sys     0m0.619s
 
Guest NFS Mount Options:
rw, relatime, vers=4.2,
rsize=1048576, wsize=1048576,
namlen=255, hard, proto=tcp, timeo=600,
retrans=2, sec=sys, local_lock=none,

When I try to copy to the same NFS share from the ESXI host the throughput is much lower, working out at about 45MB/s:

/vmfs/volumes/9043e582-0376fe3e] time sh -c "dd if=/dev/zero of=./test bs=1MB count=1024 && sync"
1024+0 records in
1024+0 records out
real    0m 22.70s
user    0m 0.00s
sys     0m 0.00s

ESXI NFS Mount Options
Cant seem to find a way to see the mount options ESXI uses?

One thing I did note is that turning off sync on the ZFS data share on the server sped up ESXI writes to 146MB/s. Still a lot lower than the guest OS.

My assumption is that ESXI is being super safe and ensuring everything is synced 100%. Does anyone know if this is the case and does anyone has any tips on improving the performance for the backup?

4
  • Testing ESXi performance with dd or file operations within ESXi shell does not really measure anything as this shell has some resource restrictions. Better way would be to install a VM with disks on NFS, give this VM a good deal of CPU/memory and do your dd tests within it on a local disk. Your results may vary dramatically. Apr 18 at 11:23
  • Yes, so the guest OS in my test above is running in a virtual on the same machine and writing to the NFS share and it does preform well 250MB/s). Due to how GhettoVCB works, I need to NFS speed to be decent at the host level. which is where the backup are taken and stored before being moved to the NFS store. As you say could possibly be that the esxi hypervisor OS is just not designed to get decent performance to NFS at the hypervisor level. But i would find that surprising as ESXI would have that same limitation when running virtual machines off the same NFS.
    – Rtype
    Apr 18 at 12:06
  • You can try and mimic behavior of Veeam by attaching disks from machine to be backed up to another machine via CLI and then backup from there. But transfer speed limitations over network from within ESXi shell are old and long standing issue. Apr 18 at 12:33
  • Thanks. If you post that as an answer I will accept it.
    – Rtype
    Apr 18 at 13:36

2 Answers 2

4

What you see is absolutely normal and not fixable AS IS. VMware ESXi “by design” has no disk cache unlike guest OS inside a VM, that one really does! So, when you copy your file (which is a rogue test itself, you should be using more sophisticated benchmarks) from inside a VM you’re saturating your network as your pipelined sequential read is faster than network itself, but host ESXi has to read the data (slow, there’s no read ahead) into mmap()-ed shared storage / network memory buffers, initiate stateless NFS write, read disk again and on in the loop. If you’ll launch WireShark you’ll see guest VM Tx traffic is steady, and host OS does it with sorta spikes on Tx.

As a workaround you might want to get some caching RAID controller with a beefy on-board memory or throw in second node, build a cluster, configure vSAN (VMUG pricing is quite affordable for vSphere+vSAN). VMware vSAN will cache local disks at his level, below VMFS so you’ll saturate your 2.5Gb again.

3

As a workaround you might want to get some caching RAID controller with a beefy on-board memory or throw in second node, build a cluster, configure vSAN (VMUG pricing is quite affordable for vSphere+vSAN). VMware vSAN will cache local disks at his level, below VMFS so you’ll saturate your 2.5Gb again.

That is a worthwhile option. Alternatively, look forward Starwind VSAN which works on a block level and should give you better performance. It supports a mdamd raid that might also make sense to try.

1
  • 2
    Do you want to run it inside a virtual machine in a loopback or what? Jun 21 at 10:39

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .