-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase Linux pipe buffer on zfs recv size to the maximum system size #3171
Conversation
2eeff38
to
3ea7a69
Compare
Do we have any performance data on how much (if any) this helps? Also what is the normal default pipe size and what is the maximum (1048576 on my system). It looks like this functionality was added in 2.6.35 so we'll probably also need an autoconf check for this to avoid breaking the build. |
@behlendorf We have performance data for mbuffer from @pyavdr in #1161, but we don't have any performance data for this specific patch. We might be able to get a volunteer from #1161 to get data for us. If it is alright with you, I would prefer to avoid an autotools check by exploiting Linux's stable userland/kernel API boundary by checking for ENOSYS and doing some preprocessor definitons such as the following instead of an autotools check:
This would allow us to build the binaries on a system whose userland predates the introduction of Would this approach be alright? |
I'd be very cautious about defining F_SETPIPE_SZ/F_GETPIPE_SZ ourselves. We really can't safely assume those ids aren't already being used on a specific platform. They may already be defined for another purpose, in which case we can't assume we'll get ENOSYS. Since this is just an optimization I think we should just wrap the entire block in an Also just for reference these values are definitely slightly differently on my platform. The end result in the same but the header is a little different.
|
@behlendorf The code sample that you copied is from the kernel headers, which define F_LINUX_SPECIFIC_BASE as an offset whose value is equal to F_SETLEASE. The values should be identical on your platform. The method of computation that I picked just happened to be different because I liked the idea of (ab)usng F_GETLEASE for symmetry. We should always be able to obtain the correct values in this way. If you look at glibc's headers, you will find that glibc hard codes these values on all Linux platforms:
It is possible that systems with these options in the headers won't have them in the kernel and vice versa. Consequently, there should be plenty of binaries in the wild that assume these values and if written properly, will assume an Is your concern that there exists a Linux architecture on which these values are different (glibc would contradict this) or that there is a distribution in which these values have been changed? If it is the former, that does not appear to be an issue. If it is the latter, it should be safe to (ab)use Are you certain that we should not do runtime detection here? |
de7ef90
to
3530cf2
Compare
I just pushed one last try at runtime detection to demonstrate what I have in mind. This one should pass the builders. I know that some people run newer kernels with older userlands, so it will be unfortunate if we cannot do things this way. |
@behlendorf I overlooked your earlier question. The normal pipe size is 64KB on Linux. |
Re-reading my comment I wasn't very clear. No, I'm actually not concerned about different Linux architectures defining this differently. I was worried about non-Linux, non-GNU platforms which might define this differently. I'm happy to see you and @lundman came to basically the solution I was going to suggest. Although I'd suggest wrapping this in From
|
Or maybe more correctly |
@behlendorf Changing this to While glibc hides non-standard extensions behind |
@ryao I'm not set on adding |
I noticed when reviewing documentation that it is possible for userspace to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change the kernel pipe buffer size on Linux to increase the pipe size up to the value specified in /proc/sys/fs/pipe-max-size. There are users using mbuffer to improve zfs recv performance when piping over the network, so it seems advantageous to integrate such functionality directly into the zfs recv tool. This avoids the addition of two buffers and two copies (one for the buffer mbuffer adds and another for the additional pipe), so it should be more efficient. This could have been made configurable and/or this could have changed the value back to the original after we were done with the file descriptor, but I do not see a strong case for doing either, so I went with a simple implementation. Closes openzfs#1161 Signed-off-by: Richard Yao <[email protected]>
From fcntl(2):
Maybe it'd be a good idea to allow privileged users to set it to a higher value if they wish so. |
@thegreatgazoo that's not a bad thought but let's avoid complicating this further until we have a real use case for that. I've merged the refreshed patch to master, we'll see in practice if this optimization helps. 5c3f61e Increase Linux pipe buffer size on 'zfs receive' |
I noticed when reviewing documentation that it is possible for userspace
to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change the
kernel pipe buffer size on Linux to increase the pipe size up to the
value specified in /proc/sys/fs/pipe-max-size. There are users using
mbuffer to improve zfs recv performance when piping over the network, so
it seems advantageous to integrate such functionality directly into the
zfs recv tool. This avoids the addition of two buffers and two copies
(one for the buffer mbuffer adds and another for the additional pipe),
so it should be more efficient. This could have been made configurable
and/or this could have changed the value back to the original (had we
read it) after we were done with the file descriptor, but I do not see a
strong case for doing either, so I went with a simple implementation.
Closes #1161
Signed-off-by: Richard Yao [email protected]