EDIT: this is a full benchmark I run on my pool: https://gist.github.com/thegabriele97/9d82ddfbf0f4ec00dbcebc4d6cda29b3.
Hi! I ran into this issue since I started mu homelab adventure a couple of months ago, so I am still very noob, sorry for this.
I decided today to understand what happens and why it happens but I need your help to understand it better.
My homelab consists of a proxmox setup with three 1 TB HDD s in raidz1 (ZFS) (I know the downsides of this and I took my decisions) and 8 GB of RAM, of which 3.5 are assigned to a VM. The remaining parts are used by some LXC containers.
During high worloads (i.e. copying a file, downloading something via torrent/jdownloader) everything is very slow and other services start to be unresponsive due to the high IO delay.
I decided to test the three single devices with this command:
fio --ioengine=libaio --filename=/dev/sda --size=4G --time_based --name=fio --group_reporting --runtime=10 --direct=1 --sync=1 --iodepth=1 --rw=randread --bs=4k --numjobs=32
And more or less they (sda, sdb, sdc) give this results:
Jobs: 32 (f=32): [r(32)][100.0%][r=436KiB/s][r=109 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=32): err= 0: pid=3350293: Sat Jun 24 11:07:02 2023
read: IOPS=119, BW=479KiB/s (490kB/s)(4968KiB/10378msec)
slat (nsec): min=4410, max=40660, avg=12374.56, stdev=5066.56
clat (msec): min=17, max=780, avg=260.78, stdev=132.27
lat (msec): min=17, max=780, avg=260.79, stdev=132.27
clat percentiles (msec):
| 1.00th=[ 26], 5.00th=[ 50], 10.00th=[ 80], 20.00th=[ 140],
| 30.00th=[ 188], 40.00th=[ 230], 50.00th=[ 264], 60.00th=[ 296],
| 70.00th=[ 326], 80.00th=[ 372], 90.00th=[ 430], 95.00th=[ 477],
| 99.00th=[ 617], 99.50th=[ 634], 99.90th=[ 768], 99.95th=[ 785],
| 99.99th=[ 785]
bw ( KiB/s): min= 256, max= 904, per=100.00%, avg=484.71, stdev= 6.17, samples=639
iops : min= 64, max= 226, avg=121.14, stdev= 1.54, samples=639
lat (msec) : 20=0.32%, 50=4.91%, 100=8.13%, 250=32.85%, 500=49.68%
lat (msec) : 750=3.86%, 1000=0.24%
cpu : usr=0.01%, sys=0.00%, ctx=1246, majf=11, minf=562
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=1242,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=479KiB/s (490kB/s), 479KiB/s-479KiB/s (490kB/s-490kB/s), io=4968KiB (5087kB), run=10378-10378msec
Disk stats (read/write):
sda: ios=1470/89, merge=6/7, ticks=385624/14369, in_queue=405546, util=96.66%
Am I wrong or it is a very bad results? Why? The three identical HDs are this one: https://smarthdd.com/database/APPLE-HDD-HTS541010A9E662/JA0AB560/
I jope you can help me. Thank you!
I don’t know about fio but I normally use iotop command to identify the process that is doing too much I/O operations
I have a zfs raid1 with 5 disks, and had some very bad performance. I used atop to figure out that one disk was the problem. I replaced that disk, resynced, and now performance is as expected.
What record size have you set for your dataset? If you are not doing a lot of small writes or you can tolerate the fragmentation, better set it to 1M.
Also,
…8 GB of RAM, of which 3.5 are assigned to a VM…
Default ZFS installation reserves half of the total system memory for its ARC. In your case that means 4GB. And your VM is taking 3.5GB. Are you running anything else? Also is the assignment to VM dynamic? ZFS will release portion of the reserved RAM when the overall demand gets stringent. And that will have adverse impact on read performance.