-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow write performance with zfs 0.8 #8836
Comments
Did you test on same kernel version? Looks like #8793 |
The values I am showing here are all kernel 4.19. I have a few numbers for kernel 5.0 which basically confirm the numbers from kernel 4.19. No significant difference by kernel version. But the zfs version makes a big difference. write IOPS is down to 70 % with zfs 0.8. Average write IOPS 249 with zfs version 0.7.13 and 175 with version 0.8. |
Since you're using 4.19.46, this is probably #8793 as mentioned above. The symbol export that allowed SIMD-accelerated checksums was removed from the 4.19 branch with 4.19.38. Maybe set |
If this is caused by the lack of SIMD support them you should be able to see the same drop in performance using 0.7.13 and the 4.19.46 kernel. It would be good to know either way. |
I did two runs with checksum=off and it does NOT make a difference. Write performance is still down to ca. 70 %. My benchmark numbers for version 0.7.13 are from kernels 4.19.42, 4.19.34, 4.19.28 and 4.19.26 (following the Manjaro Testing upgrades). The benchmark numbers for version 0.8 are only for kernel 4.19.46. Are you suggesting that this is a kernel regression? |
Since you achieved the expected performance using 0.7.13 and the 4.19.42 kernel that should rule out the kernel's SIMD changes as a cause. Further investigation is going to be needed to determine exactly why you're seeing a drop in write performance. |
In the zfs manpages it states to consider changing the |
The system is always idle when I do the tests. I am doing this already since a while. Unfortunately I have only kept the logs since March of this year. But the results have always been comparable as long as I remember. Even with recordsize 128k. Of course there is always some variance in the values. But a performance decrease of 30 % is a significant change. |
Look at the history of the pool |
There is nothing in the history other than the regular import or snapshot commands. |
I did some more tests. Also with another pool. The other pool is a raidz2 with 6 drives in an external USB case. The interesting finding for me is, that this pool (zf1) is NOT showing performance differences. But I certainly see write performance issues with the internal pool (zstore). I compared the out of "zfs get all" for both zstore and zf1 and there is no important difference other than mountpoint and such. Basic parameters are all the same. I also doublechecked that checksum=on/off does not make a difference. Once again some results for zstore: old (good) values with zfs 0.7.13:
new (bad) values with zfs 0.8.0:
|
Let zfs report what is happening on the pool and on each vdev with this command Also use the command
The Monitor zfs process while working, such of cache info, memory status etc. Also ensure that you have your ashift value are accurate with following Ashift info below: |
NB, running zpool iostat with short interval (eg < zfs_txg_timeout) is almost always a waste of effort. A better solution is to use one of the telemetry collectors, telegraf or node_exporter, to collect the data and forward it to a TSDB, like influxdb or prometheus, and analyzed with tools like grafana. |
@richardelling Could a telemetry collector, TSDB and an analyzed tool be implemented into zfs itself since working with iostat is a waste of effort and difficult to grok. I would like to know that all tools and features in zfs are useful, which I can use to gain meaningful information from zfs. I have installed telegraf which is just pulling information from |
no, it is a really bad idea and goes counter to the UNIX philosophy. Today ZFS makes stats available, but reading them is not a free operation. So designing a monitoring system needs to meet very different business requirements. For this reason it is best to have integration to the best-in-class monitoring systems. I only mentioned a few of the open source tools that are popular. There are many more tools in the market. |
For what its worth, I've also seen huge Performance decreases on my pool. Write speed has throttled down to 30MB/s from 600MB/s+ If you've got a reasonable method for me to collect performance data I will also assist in this.
All direct attached from a Dell Perc h310 controller in IT mode. |
@Setsuna-Xero do I understand correctly that you see this performance drop for both 0.8.0-rc3 and the 0.8.0 tag? |
@behlendorf I will be moving this array to another server with a 4.19.41 kernel as soon as the drive cages arrive however |
It is striking to me that @Setsuna-Xero is seeing the performance drop also with a RAID10 system. Can it be that the RAID level makes the difference? I have another pool as RAIDZ2 which is not showing a performance drop. |
I'm getting 2-3MB/s with cp/cq and tar. rsync gets an order of magnitude higher at right around 30MB/s. |
I benchmarked sequential writes on a 6-disk RAIDz2 (all HDD) using Proxmox 6 with ZFS 0.8.1 and Kernel 5.0. The array struggled to maintain the single-disk sequential speed, around 200MB/sec. An older ZoL build (0.7.13 with older kernel) shows more than double the speed with the same configuration, around 450MB/sec. |
The branch 0.6.x was spining like a tornado. Does somebody compile&test master branch with commit e5db313 ? |
Documenting this in case it helps. Seems clear that this is related to lack of SIMD - higher RAID-Z levels use a lot of CPU and scalar perf isn't enough. cat /proc/spl/kstat/zfs/vdev_raidz_bench ("scalar" row) on a Xeon 4108: gen_p (RAID-Z) is 1.13GB/sec SIMD makes everything 5-7x faster, so restoring SIMD should help this problem. |
@amissus What version did You tested and showed results? 0.7.19 does not exist. |
I'm sorry, version 0.7.13 has expected performance for me and >= 0.8 has degraded and unstable performance. |
What exactly am I reading here? [root@hostname~]# cat /proc/spl/kstat/zfs/vdev_raidz_bench
18 0 0x01 -1 0 5551518943 1459035087503366
implementation gen_p gen_pq gen_pqr
original 383443168 135674622 67712690
scalar 1682391699 530611710 228126033
fastest scalar scalar scalar |
@behlendorf this issue was created at 30 May, the solution for this issue come in master branch at 12 Jul. This is very important case for us (users). When do plan to do next release of ZFS with this commit? What is the project politics for releases? |
It seems zfs 0.8.2 was released, but without the fix in e5db313 😢 |
Does this issue concern the kernel 3.10.0-1062.1.1.el7.x86_64 as well? |
@DannCos the 3.10.0-1062.1.1.el7.x86_64 is not effected by this issue. |
I decided to conduct some tests under CentOS 7 (with 3.10.0-1062.1.1.el7.x86_64) - the reason I decided to do this, was simply due to replacing a storage server, one running 0.7 and one running 0.8 - I experienced issues with slow read performance under the new system. Old server:
New server:
Both servers use about 20TB of storage and store 280 million files. The old server would restore a 1GB backup with 100k files in about 1.5 minutes where the new one would do the same folder in 17 minutes. Note: Writes seems to be decent on both systems, reads being the main affected. Both tests were performed on an idle system right after rebooting the system (to ensure that no cache or anything got hit). atime turned off, lz4 compression turned on, dedup off. It made me search and I found this thread regarding performance issues, so I wanted to test out various versions of ZoL as well as ZFS on FreeBSD 12. For this I set up another machine:
All tests below use the same zpool create parameters: Test directory structure being 11294 megabyte and 311153 inodes. It's also worth noting that the only data being stored on the pool is the test directory structure, nothing else - whether performance becomes worse as the dataset grows, I don't know (Hopefully it doesn't). Backup/restore is performed using rsync on a local network (1gigabit/s) with no other communication happening: ZFS 0.6 (Installed via Ubuntu 16.04):
ZFS 0.7 (Installed via CentOS 7.7 using zfs-release.el7_6):
ZFS 0.8 (Installed via CentOS 7.7 using zfs-release.el7_7):
Backup times (writing to ZFS) seems to stay pretty consistent in my case - likely also being limited by the 1g link between machines, but average is about the 2 minute and 10 seconds, or about 700mbps. What surprises me is the drop between 0.7 and 0.8 is the read performance experienced, especially for raidz2. From 3 min and 18 seconds to 5 minutes and 54 seconds. That's 78% increase in restoration times. Just for fun, I tried to give FreeBSD 12 a try:
Whether it performs better under FreeBSD 11.x I haven't had the time to test yet. Now, I'd expect performance to be roughly the same on the same hardware. My tests conducted does still not explain the massive slowdown I experience between the two real systems with more powerful hardware - hopefully adding more memory to a system (64 vs 128GB) shouldn't make performance worse. I know this issue is mainly related to write performance, however, I do find it important that read performance gets mentioned as well, especially under the It makes me believe that there may be some other regression between 0.7 and 0.8 that may affect the overall performance as well, other than the SIMD. If people want me to test with some other settings, I'm more than happy to do so. Ideally, I want my backup server to remain snappy so in case of restores being needed that they can actually be performed rather quickly. |
@lucasRolff could You do a benchmark for 0.8.3 version whitch was released few days ago? |
I just did a test with 0.8.3 and kernel 5.4.14. I do see better IOPS. Average of 7 runs:
This is certainly better than what I had before (#8836 (comment); The read speed is very good. Same level or better than 0.7.13. But the write speed is still behind 0.7.13. |
@mabod out of curiosity, how did you run the benchmark? just want to compare results. |
I explained it in this thread. It is a fio benchmark. The fio option files are in this thread too. |
@interduo - I moved my backup servers to 100% SSD storage and (sadly) using a hardware raid 6 :) Eventually, I'll give ZFS a try again on spinning disks and see how it performs. |
does not sound like SIMD is the only problem with this
|
did You do Your tests on 0.8.4 release? Could You post results? |
I can not compare my test results anymore because I have replaced all 4 HD in that RAID10 in the meantime. Sorry. |
@interduo I was thinking that the SIMD issue would only really affect RaidZ/compression/encryption but not a mirror, and so it might be something else. I'm not sure if I can quickly run a few tests, if yes, I'll update. |
I'm a bit late to this party... For those of us building our own kernel for private use, is it possible to avoid "the SIMD issue" by reintroducing the symbols that are no longer exported and, if so, how would you do that? I've been running 0.8.4 on 4.14.23 for about a week now (with the impression that reads seem a bit faster compared to 0.7.12, writing probably slower, judging from compiler job durations). I'm building kernel 4.19.133 as we speak so now would be a good time to restore those SIMD exports... |
Thanks!
The commit message surprises me a bit: I would have expected that checksumming uses the crc32 intrinsic from SSE4 (4.2 IIRC) and that's not being mentioned. Good thing even my slow beater (N3150) has AES and AVX!
Edit: the name surprises me too ... suggesting the patch was already needed in the 4.14 kernel.
|
@RJVB it is but not in 4.14.0, change was made as a backport to some later version, I can't remember which one right now. |
Saw that. I have been wondering if there's a compelling reason to migrate to a 5.x kernel, beyond "latest is always greatest" or features I didn't know I couldn't do without... Either way it seemed smart to live with the latest 4.x kernel for a while first.
Now just to be certain: can I assume that the re-exported functions will be picked up automagically during the ZFS 0.8.4 (dkms) kernel module build (I don't see any NixOS patches to ZFS)?
|
@RJVB yes, OpenZFS checks individually each kernel functionality during the build process regardless of the kernel version. |
And then the kernel module (one of the) simply fails to build: #10601 :-/
|
I take it this patch has been tested with ZFS? After working around the build failure I could finally boot a VM into my new 4.19 kernel, with the ZFS 0.8.4 kmods ready to roll. The VM runs under VirtualBox, using "raw disk" access to actual external drives connected via USB3. When I imported a pool (created recently by splitting off a dedicated mirror vdev from my main Linux rig's root pool) I discovered it had a number of corrupted items. I don't know if the corruption occurred during the previous time I'd used that pool, or during import. The identified items were all directories, curiously, (in a dataset that has copies=1 because it has its own registry that doubles as an online backup) and the errors could be clear by making an identical copy ( Can I suppose that every single directory on (at least) every single dataset with copies=1 would have been affected if this were due to an issue with my kernel patches *) or the workarounds I applied to get the ZFS kmods to build? *): I also use the ConKolivas patches (which I had to refactor for 4.19.133) and a patch to make zswap use B-Trees. |
mark |
I am closing issue. This was for version 0.8.0 which is obsolete since a long time. |
System information
Describe the problem you're observing
I do frequent fio benchmarks with my pool "zstore" and just realized that write performance is dropping with zfs version 0.8.
With zfs version 0.7.13 I typically got around 230-250 write IOPS:
with zfs version 0.8 I only get 160-190 write IOPS:
The read IOPS seem to be the same in the range of 260-280. Where is this write performance difference coming from?
Here are the pool details:
zfs recordsize is 1M. No compression. No dedup
Describe how to reproduce the problem
I am using the following fio option files for read and write with a SIZE of 32G:
The text was updated successfully, but these errors were encountered: