Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File system locks when many concurent threads are opened #608

Closed
mikhmv opened this issue Mar 19, 2012 · 10 comments
Closed

File system locks when many concurent threads are opened #608

mikhmv opened this issue Mar 19, 2012 · 10 comments
Labels
Type: Performance Performance improvement or performance problem
Milestone

Comments

@mikhmv
Copy link

mikhmv commented Mar 19, 2012

Hi,
I have an issue with File system accessibility on heavy load.
I am using 5 disks in RAID-Z, Compression gzip-5, Dedup on. Host with 64 cores and 256GB RAM.
When I run some processes which used many open file connections the system completely unresponsive.

As an example, the "ls" can take 1 minute.

Regards,
Max

@ryao
Copy link
Contributor

ryao commented Mar 19, 2012

Which version of ZFS and which distribution?

@mikhmv
Copy link
Author

mikhmv commented Mar 19, 2012

Here is a system info:

max@s0:~$ uname -a
Linux s0 3.2.0-19-generic #30-Ubuntu SMP Fri Mar 16 16:27:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

max@s0:$ dpkg -s zfs-dkms
Package: zfs-dkms
Status: install ok installed
Priority: extra
Section: kernel
Installed-Size: 9437
Maintainer: Darik Horn [email protected]
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.54-0ubuntu1
precise1

max@s0:$ dpkg -s zfsutils
Package: zfsutils
Status: install ok installed
Priority: extra
Section: admin
Installed-Size: 696
Maintainer: Darik Horn [email protected]
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.54-0ubuntu1
precise1
max@s0:$ dpkg -s libuutil1
Package: libuutil1
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 147
Maintainer: Darik Horn [email protected]
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.54-0ubuntu1
precise1

max@s0:$ dpkg -s libzfs1
Package: libzfs1
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 307
Maintainer: Darik Horn [email protected]
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.54-0ubuntu1
precise1

max@s0:$ dpkg -s libzpool1
Package: libzpool1
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 1122
Maintainer: Darik Horn [email protected]
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.54-0ubuntu1
precise1

max@s0:$ dpkg -s zfs-auto-snapshot
Package: zfs-auto-snapshot
Status: install ok installed
Priority: extra
Section: admin
Installed-Size: 67
Maintainer: Darik Horn [email protected]
Architecture: all
Version: 1.0.8-0ubuntu1
precise1

max@s0:~$ sudo zpool list
[sudo] password for max:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 9.06T 6.37T 2.69T 70% 1.01x ONLINE -

max@s0:~$ sudo zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Thu Mar 15 20:19:58 2012
config:

    NAME        STATE     READ WRITE CKSUM
    tank        ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        d1      ONLINE       0     0     0
        d2      ONLINE       0     0     0
        d3      ONLINE       0     0     0
        d4      ONLINE       0     0     0
        d5      ONLINE       0     0     0

errors: No known data errors

max@s0:~$ sudo zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 5.11T 2.04T 256K /tank
tank/Irina 310G 2.04T 310G /tank/Irina
tank/OpenNebula 866G 2.04T 863G /tank/OpenNebula
tank/biouml-shared 3.94T 2.04T 3.79T /tank/biouml-shared

max@s0:~$ zfs get all tank/biouml-shared
exportfs: could not open /var/lib/nfs/.etab.lock for locking: errno 13 (Permission denied)
NAME PROPERTY VALUE SOURCE
tank/biouml-shared type filesystem -
tank/biouml-shared creation Sun Feb 5 8:29 2012 -
tank/biouml-shared used 3.94T -
tank/biouml-shared available 2.04T -
tank/biouml-shared referenced 3.79T -
tank/biouml-shared compressratio 1.08x -
tank/biouml-shared mounted yes -
tank/biouml-shared quota none default
tank/biouml-shared reservation none default
tank/biouml-shared recordsize 128K default
tank/biouml-shared mountpoint /tank/biouml-shared default
tank/biouml-shared sharenfs off local
tank/biouml-shared checksum on default
tank/biouml-shared compression gzip local
tank/biouml-shared atime on default
tank/biouml-shared devices on default
tank/biouml-shared exec on default
tank/biouml-shared setuid on default
tank/biouml-shared readonly off default
tank/biouml-shared zoned off default
tank/biouml-shared snapdir hidden default
tank/biouml-shared aclinherit restricted default
tank/biouml-shared canmount on default
tank/biouml-shared xattr on default
tank/biouml-shared copies 1 default
tank/biouml-shared version 5 -
tank/biouml-shared utf8only off -
tank/biouml-shared normalization none -
tank/biouml-shared casesensitivity sensitive -
tank/biouml-shared vscan off default
tank/biouml-shared nbmand off default
tank/biouml-shared sharesmb off default
tank/biouml-shared refquota none default
tank/biouml-shared refreservation none default
tank/biouml-shared primarycache all default
tank/biouml-shared secondarycache all default
tank/biouml-shared usedbysnapshots 154G -
tank/biouml-shared usedbydataset 3.79T -
tank/biouml-shared usedbychildren 0 -
tank/biouml-shared usedbyrefreservation 0 -
tank/biouml-shared logbias latency default
tank/biouml-shared dedup on inherited from tank
tank/biouml-shared mlslabel none default
tank/biouml-shared sync standard default
tank/biouml-shared refcompressratio 1.04x -

Regular hardrive test:
max@s0:~$ time echo test zfs speed > test.txt

real 0m0.016s
user 0m0.000s
sys 0m0.000s

ZFS:
oneadmin@s0:/tank/biouml-shared/tmp-tools$ time echo test zfs speed > test.txt

real 0m2.446s
user 0m0.000s
sys 0m0.000s

oneadmin@s0:/tank/biouml-shared/tmp$ time ls -lahs > test.time.txt

real 0m8.420s
user 0m0.000s
sys 0m0.040s

max@s0:~$ zpool iostat 5
exportfs: could not open /var/lib/nfs/.etab.lock for locking: errno 13 (Permission denied)
capacity operations bandwidth
pool alloc free read write read write


tank 6.37T 2.69T 307 194 30.2M 783K
tank 6.37T 2.69T 353 121 31.8M 700K
tank 6.37T 2.69T 248 209 14.0M 1.22M
tank 6.37T 2.69T 219 226 13.9M 1.38M
tank 6.37T 2.69T 333 97 35.4M 544K
tank 6.37T 2.69T 245 334 11.4M 1.86M
tank 6.37T 2.69T 232 128 15.5M 662K
tank 6.37T 2.69T 317 50 38.2M 110K

I performed these tests when system is a little responsive. It was worse before.

max@s0:~$ sudo lsof | grep tank| wc -l
124

I will provide tests again when system will be under heavy load.

@mikhmv
Copy link
Author

mikhmv commented Mar 20, 2012

My system now is well loaded. You can see performance here:

time ls -lahs realigned/
total 2.4G
39K drwx------ 3 oneadmin cloud 11 Mar 20 14:30 .
14K drwx------ 7 oneadmin cloud 7 Mar 20 02:07 ..
512 -rw------- 1 oneadmin cloud 0 Mar 20 14:30 5173N_sorted_dedup_rg_dd2_kar.chr14.ra.bam
280M -rw------- 1 oneadmin cloud 282M Mar 20 14:30 5173N_sorted_dedup_rg_dd2_kar.chr15.ra.bam
65K -rw------- 1 oneadmin cloud 113K Mar 19 21:02 5173N_sorted_dedup_rg_dd2_kar.chr22.ra.bai
1.9G -rw------- 1 oneadmin cloud 5.3G Mar 19 21:02 5173N_sorted_dedup_rg_dd2_kar.chr22.ra.bam
7.0K -rw------- 1 oneadmin cloud 5 Mar 19 21:02 5173N_sorted_dedup_rg_dd2_kar.chr22.ra.bam.done
7.0K -rw------- 1 oneadmin cloud 2.4K Mar 20 02:50 5173N_sorted_dedup_rg_dd2_kar.chrM.ra.bai
259M -rw------- 1 oneadmin cloud 259M Mar 20 02:50 5173N_sorted_dedup_rg_dd2_kar.chrM.ra.bam
7.0K -rw------- 1 oneadmin cloud 5 Mar 20 03:38 5173N_sorted_dedup_rg_dd2_kar.chrM.ra.bam.done
39K drwx------ 2 oneadmin cloud 32 Mar 20 14:08 logs

real 4m36.037s
user 0m0.004s

sys 0m0.008s

zpool iostat 5
capacity operations bandwidth
pool alloc free read write read write


tank 6.39T 2.67T 320 203 29.3M 956K
tank 6.39T 2.67T 396 108 44.0M 829K
tank 6.39T 2.67T 407 109 45.7M 829K
tank 6.39T 2.67T 464 98 52.2M 548K


max@s0:/var/lib/one/var$ sudo lsof | grep tank| wc -l
99

@mikhmv
Copy link
Author

mikhmv commented Mar 20, 2012

The system ignoring writing when present several concurrent reads.

In next test I have several reading streams and have several writing (cp command).

max@s0:/var/lib/one/var$ sudo zpool iostat 5
[sudo] password for max:
capacity operations bandwidth
pool alloc free read write read write


tank 6.39T 2.67T 322 201 29.5M 950K
tank 6.39T 2.67T 473 0 54.9M 0
tank 6.39T 2.67T 506 0 58.7M 0
tank 6.39T 2.67T 390 0 42.4M 0
tank 6.39T 2.67T 357 0 38.7M 0
tank 6.39T 2.67T 195 0 15.8M 0
tank 6.39T 2.67T 297 0 30.2M 0
tank 6.39T 2.67T 409 0 45.4M 0

@mikhmv
Copy link
Author

mikhmv commented Mar 20, 2012

longer log:

max@s0:/var/lib/one/var$ sudo zpool iostat 5
[sudo] password for max:
capacity operations bandwidth
pool alloc free read write read write


tank 6.39T 2.67T 322 201 29.5M 950K
tank 6.39T 2.67T 473 0 54.9M 0
tank 6.39T 2.67T 506 0 58.7M 0
tank 6.39T 2.67T 390 0 42.4M 0
tank 6.39T 2.67T 357 0 38.7M 0
tank 6.39T 2.67T 195 0 15.8M 0
tank 6.39T 2.67T 297 0 30.2M 0
tank 6.39T 2.67T 409 0 45.4M 0
tank 6.39T 2.67T 458 0 52.5M 0
tank 6.39T 2.67T 391 0 43.9M 0
tank 6.39T 2.67T 214 0 16.5M 0
tank 6.39T 2.67T 400 0 42.5M 0
tank 6.39T 2.67T 237 0 20.0M 0
tank 6.39T 2.67T 335 0 34.9M 0
tank 6.39T 2.67T 316 0 31.7M 0
tank 6.39T 2.67T 345 0 36.0M 0
tank 6.39T 2.67T 173 0 10.5M 0
tank 6.39T 2.67T 227 0 19.3M 0
tank 6.39T 2.67T 371 0 39.4M 0
tank 6.39T 2.67T 277 0 25.8M 0
tank 6.39T 2.67T 314 0 29.9M 0
tank 6.39T 2.67T 299 0 30.2M 0
tank 6.39T 2.67T 232 0 19.0M 0
tank 6.39T 2.67T 277 0 27.5M 0
tank 6.39T 2.67T 243 0 21.8M 0
tank 6.39T 2.67T 306 0 30.9M 0
tank 6.39T 2.67T 245 0 22.6M 0
tank 6.39T 2.67T 265 0 22.6M 0
tank 6.39T 2.67T 377 0 39.5M 0
tank 6.39T 2.67T 165 0 9.42M 0
tank 6.39T 2.67T 230 0 17.9M 0
tank 6.39T 2.67T 185 0 12.0M 0
tank 6.39T 2.67T 318 0 31.3M 0
tank 6.39T 2.67T 414 0 43.7M 0
tank 6.39T 2.67T 319 0 30.0M 0
tank 6.39T 2.67T 284 0 26.1M 0
tank 6.39T 2.67T 204 0 13.9M 0
tank 6.39T 2.67T 390 0 42.2M 0
tank 6.39T 2.67T 413 0 44.2M 0

@mikhmv
Copy link
Author

mikhmv commented Mar 21, 2012

I think the priorities of writes should be higher than reads as you can theoretically read unlimited amount of data but writes usually limited.

I have data analysis workflow which reading data, storing in RAM and writing back. The program (I am not a developed it) has indipendent threads for reading and writing. What happens that memory usage is growing as program cannot write anything. In the addition this behaviour completely blocking a machine

@ryao
Copy link
Contributor

ryao commented Apr 19, 2012

@mikhmv Pull request #660 might solve your problem.

@ryao
Copy link
Contributor

ryao commented May 17, 2012

Pull request #660 was merged. Do you still have this problem against the latest code?

@mikhmv
Copy link
Author

mikhmv commented May 17, 2012

Hard to say. I am using stable version now. It has bug with removing big files, but I don't do it often. I stop using daily because it was unstable (2 weeks ago). I had to reboot server 3 times per day.

@behlendorf
Copy link
Contributor

Closing this issue as stale. If you're still observing similiar issues with the latest code please go ahead and open a new issue.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue May 21, 2018
There are changes to vfs_getattr() in torvalds/linux@a528d35.  The new
interface is:

int vfs_getattr(const struct path *path, struct kstat *stat,
               u32 request_mask, unsigned int query_flags)

The request_mask argument indicates which field(s) the caller intends to
use.  Fields the caller does not specify via request_mask may be set in
the returned struct anyway, but their values may be approximate.

The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.

This patch uses the query_flags which result in vfs_getattr behaving the same
as it did with the 2-argument version which the kernel provided before
Linux 4.11.

Members blksize and blocks are now always the same size regardless of
arch.  They match the size of the equivalent members in vnode_t.

The configure checks are modified to ensure that the appropriate
vfs_getattr() interface is used.

A more complete fix, removing the ZFS dependency on vfs_getattr()
entirely, is deferred as it is a much larger project.

Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes openzfs#608
behlendorf pushed a commit to behlendorf/zfs that referenced this issue May 21, 2018
In Linux 4.11, torvalds/linux@2a1f062, signal handling related functions
were moved from sched.h into sched/signal.h.

Add configure checks to detect this and include the new file where
needed.

Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes openzfs#608
behlendorf pushed a commit to behlendorf/zfs that referenced this issue May 21, 2018
Before kernel 2.6.29 credentials were embedded in task_structs, and zfs had
cases where one thread would need to refer to the credential of another thread,
forcing it to take a hold on the foreign thread's task_struct to ensure it was
not freed.

Since 2.6.29, the credential has been moved out of the task_struct into a
cred_t.

In addition, the mainline kernel originally did not export __put_task_struct()
but the RHEL5 kernel did, according to openzfs/spl@e811949a570.  As of
2.6.39 the mainline kernel exports it.

There is no longer zfs code that takes or releases holds on a task_struct, and
so there is no longer any reference to __put_task_struct().

This affects the linux 4.11 kernel because the prototype for
__put_task_struct() is in a new include file (linux/sched/task.h) and so the
config check failed to detect the exported symbol.

Removing the unnecessary stub and corresponding config check.  This works on
kernels since the oldest one currently supported, 2.6.32 as shipped with
Centos/RHEL.

Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes openzfs#608
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
`cargo update`

Update our direct dependencies to the latest versions (per
`cargo outdated`), except for `azure*`, which will require changes to
our code.

Update allowed licenses to allow the Unicode license, which is actually
FSF approved but not marked as such in the SPDX metadata.

Note: requires rustc 1.61, the product uses 1.63.
Run `rustup default 1.63` on your laptop to switch to it.
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
The sysinfo crate changed the meaning of `System::total_memory()`, from
returning kilobytes to returning bytes.  This makes the agent think that
the system has 1024x the amount of RAM that it really does, and we try
to use more memory than exists.

The problem was introduced by PR openzfs#608

This commit changes our code to interpret the new meaning of the return
value correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

3 participants