-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance on 0.6.3 #3261
Comments
|
Thanks for the reply, I would agree synthetic benchmarks are not always I am convinced I am reaching some fundamental flaw or bug in ZFS that I was Are you saying because I am using raidz this could be causing my
|
A raidZ vdev exposes roughly the read IOPS of a single disk, so the layout of your pool will give you ~600 random IOPS (given they are evenly distributed over all 5 vdevs). Read seeks for cold data (like diectory listings) can saturate the available seek capacity of your system. Maybe take a look at your drives using dstat --disk-util (or something giving you insights about how busy your drives are). Mirror setups are faster for reads, since every disk can service reads indepenant - raidz needs most liekly to read from several spindles to get the data. So in case you had structured your pool as 3-way mirrors (to have near the same redundancy as you have with raidZ3, but since mirror resilvers are faster since all remaining drives can supply data so the new drive will most likely be 100% busy writing my stand is to get away with one redundant drive less) you would only have 18 drives worth of space (compared to 40 of your current setup) but ~2160 writes/s or ~6480 reads/s instead of the ~600 you have now (roughly 3.5-10.5 times more iops). Since you didn't specify how big the disks are: in case you have <1TB drives and use mirrors you could add one SSD (relatively cheap samsung or the like) to each mirror to boost reads/s by several orders of magnitude (since zfs will favour faster devices for reads, sadly full potential of this isn't archived because ZFS dosn't care so far about if the disk is spinning media or not: so it'll fetch metadata from slow drives too which delays reads which need the info from the metadata). Other option is to put the hot data on a mirror (or HDD+SSD mirror vdevs) pool and use a raidZ pool for archival (send/recv). Adding SLOG devices (fast SSDs) could help in case you're blocked by fsync calls. Cheap (per TB of storage) vs. Fast (IOPS) vs. Reliable (redundancy) Nevertheless, I would suggest you upgrade to the newly released 0.6.4, the release notes are longer than both my arms (in terms of improvements/fixes) so there most likely will be some bottlenecks removed (compared to 0.6.3). |
@behlendorf thanks for the suggestion. I just upgraded my server to it last night and have started running some tests on it today. Mind you this is my enterprise tier-3 storage NAS box for 150 of my users so I was a bit cautious to upgrade so I did some tests on another server first and no issues with upgrade on CentOS6.6 (zpool upgrade worked fine as well). I just did some quick tests on the largish zpool I have setup with 3TB 7200RPM drives, (5) 8+3 zdevs with the following settings. Seems like @GregorKopka is indicating that 3-way mirrors would be better for random IO patterns then I am currently getting which is an interesting setup, but I am not sure I can take the hit on space with only 18 usable devices vs 40. But might something worth testing to see if it fills some use cases down the road.
With the default of sharenfs=on it took over 27+ mins to untar a source copy of gcc-4.9.2.tar.gz. What I noticed while the untar was keeping the LOG device very busy. So the first thing that I thought was even though my nfs client is mounted async, but zfs was still doing sync writes as data was coming in. I then re-exported my dataset with sharenfs="async,rw", remounted my client and I am seeing better NFS client performance for obvious reasons, the untar took no more then 1+ mins. I am just surprised to see how poor the sync writes are in ZFS in comparison to other filesystems that I have tested against. I guess I could also run sync=disabled as well. |
@kpande Intel SSD, behind a CCISS controller in a R1 mirror. A simple randrw test in FIO pushes around 5k IOPs with a 4k block size, and a sequential writer around 250k IOPs without an issue. I wouldnt say this is the fastest SSD available to me but still pretty decent performance. As a comparison with other (newer) SSD in systems across my environment, I can easily hit 5-10k IOPs there, so I can see how this can bottleneck my sync writes just a little but I am not totally convinced. I guess the fastest way to prove your theory would be to disable the ZIL (sync=disable) or find faster disk to try and see if that helps. |
@kpande actually in this system.. I am using OCZ talos 2 drives. |
500GB of SLOG is a waste. About fsync performance: since you're on 0.6.4 now and still have problem with fsync performance: could you update the OP title to reflect the problem for easier tracking (so it isn't closed by accident since 0.6.3 is history now)? |
Hey Gregor, Thanks for the info there, I ended up throwing in an 8GB Zeus drive in a Seeing much better performance now on single threaded sync writes from I think this is slightly better except the metadata operations over nfs are
|
Currently running 0.6.3 on CentOS6.6 where I am seeing some pretty poor performance with various workloads on my ZFSoL setup. I am using a slighter older hardware setup here, 60 bay JBOD with 6Gb/s connectors with all NL-SAS 3.5" 7200 RPM drives. I have two SAS cards connected to the JBOD where I am doing an active/active setup to my paths. Previously this host was running Solaris based ZFS and we converted it over to ZFSoL and our performance has not beenvery good.
A few things I have tried:
Below is the FIO command I have tried, seq, randrw, etc, 4k block to match kernel page size. I guess my confusion here stems from the fact that my local R1 mirror outperforms 60 spindles by an order of 4-5x.
fio --directory=/foo --name=abc --rw=seq --bs=4k --size=10G --numjobs=1 --time_based --runtime=60
Any suggested tuning in ZFS that I am overlooking?
The text was updated successfully, but these errors were encountered: