-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File system locks when many concurent threads are opened #608
Comments
Which version of ZFS and which distribution? |
Here is a system info: max@s0:~$ uname -a max@s0: max@s0: max@s0: max@s0: max@s0: max@s0:~$ sudo zpool list max@s0:~$ sudo zpool status
errors: No known data errors max@s0:~$ sudo zfs list max@s0:~$ zfs get all tank/biouml-shared Regular hardrive test: real 0m0.016s ZFS: real 0m2.446s oneadmin@s0:/tank/biouml-shared/tmp$ time ls -lahs > test.time.txt real 0m8.420s max@s0:~$ zpool iostat 5 tank 6.37T 2.69T 307 194 30.2M 783K I performed these tests when system is a little responsive. It was worse before. max@s0:~$ sudo lsof | grep tank| wc -l I will provide tests again when system will be under heavy load. |
My system now is well loaded. You can see performance here: time ls -lahs realigned/ real 4m36.037s sys 0m0.008szpool iostat 5 tank 6.39T 2.67T 320 203 29.3M 956K max@s0:/var/lib/one/var$ sudo lsof | grep tank| wc -l |
The system ignoring writing when present several concurrent reads. In next test I have several reading streams and have several writing (cp command).max@s0:/var/lib/one/var$ sudo zpool iostat 5 tank 6.39T 2.67T 322 201 29.5M 950K |
longer log:max@s0:/var/lib/one/var$ sudo zpool iostat 5 tank 6.39T 2.67T 322 201 29.5M 950K |
I think the priorities of writes should be higher than reads as you can theoretically read unlimited amount of data but writes usually limited. I have data analysis workflow which reading data, storing in RAM and writing back. The program (I am not a developed it) has indipendent threads for reading and writing. What happens that memory usage is growing as program cannot write anything. In the addition this behaviour completely blocking a machine |
Pull request #660 was merged. Do you still have this problem against the latest code? |
Hard to say. I am using stable version now. It has bug with removing big files, but I don't do it often. I stop using daily because it was unstable (2 weeks ago). I had to reboot server 3 times per day. |
Closing this issue as stale. If you're still observing similiar issues with the latest code please go ahead and open a new issue. |
There are changes to vfs_getattr() in torvalds/linux@a528d35. The new interface is: int vfs_getattr(const struct path *path, struct kstat *stat, u32 request_mask, unsigned int query_flags) The request_mask argument indicates which field(s) the caller intends to use. Fields the caller does not specify via request_mask may be set in the returned struct anyway, but their values may be approximate. The query_flags argument indicates whether the filesystem must update the attributes from the backing store. This patch uses the query_flags which result in vfs_getattr behaving the same as it did with the 2-argument version which the kernel provided before Linux 4.11. Members blksize and blocks are now always the same size regardless of arch. They match the size of the equivalent members in vnode_t. The configure checks are modified to ensure that the appropriate vfs_getattr() interface is used. A more complete fix, removing the ZFS dependency on vfs_getattr() entirely, is deferred as it is a much larger project. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes openzfs#608
In Linux 4.11, torvalds/linux@2a1f062, signal handling related functions were moved from sched.h into sched/signal.h. Add configure checks to detect this and include the new file where needed. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes openzfs#608
Before kernel 2.6.29 credentials were embedded in task_structs, and zfs had cases where one thread would need to refer to the credential of another thread, forcing it to take a hold on the foreign thread's task_struct to ensure it was not freed. Since 2.6.29, the credential has been moved out of the task_struct into a cred_t. In addition, the mainline kernel originally did not export __put_task_struct() but the RHEL5 kernel did, according to openzfs/spl@e811949a570. As of 2.6.39 the mainline kernel exports it. There is no longer zfs code that takes or releases holds on a task_struct, and so there is no longer any reference to __put_task_struct(). This affects the linux 4.11 kernel because the prototype for __put_task_struct() is in a new include file (linux/sched/task.h) and so the config check failed to detect the exported symbol. Removing the unnecessary stub and corresponding config check. This works on kernels since the oldest one currently supported, 2.6.32 as shipped with Centos/RHEL. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes openzfs#608
`cargo update` Update our direct dependencies to the latest versions (per `cargo outdated`), except for `azure*`, which will require changes to our code. Update allowed licenses to allow the Unicode license, which is actually FSF approved but not marked as such in the SPDX metadata. Note: requires rustc 1.61, the product uses 1.63. Run `rustup default 1.63` on your laptop to switch to it.
The sysinfo crate changed the meaning of `System::total_memory()`, from returning kilobytes to returning bytes. This makes the agent think that the system has 1024x the amount of RAM that it really does, and we try to use more memory than exists. The problem was introduced by PR openzfs#608 This commit changes our code to interpret the new meaning of the return value correctly.
Hi,
I have an issue with File system accessibility on heavy load.
I am using 5 disks in RAID-Z, Compression gzip-5, Dedup on. Host with 64 cores and 256GB RAM.
When I run some processes which used many open file connections the system completely unresponsive.
As an example, the "ls" can take 1 minute.
Regards,
Max
The text was updated successfully, but these errors were encountered: