Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel msync deadlock #12702

Open
shaan1337 opened this issue Oct 28, 2021 · 4 comments
Open

parallel msync deadlock #12702

shaan1337 opened this issue Oct 28, 2021 · 4 comments
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@shaan1337
Copy link
Contributor

shaan1337 commented Oct 28, 2021

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04
Kernel Version 5.11.0-1021-gcp (Google Cloud)
Architecture x86_64
OpenZFS Version zfs-0.8.3-1ubuntu12.12, zfs-kmod-2.0.2-1ubuntu5.1
Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04
Kernel Version 5.4.0-1045-aws (AWS EC2)
Architecture x86_64
OpenZFS Version zfs-2.1.1-1, zfs-kmod-2.1.1-1

Describe the problem you're observing

In an attempt to reproduce: #12662, I've come across a reproducible deadlock by doing msync in parallel to the same file.

Although the stack trace looks similar to #12662, I'm not sure if it's the same issue or if they are related at all. They both wait for a page to go out of the page writeback state but in #12662 it happens only temporarily. In this case it seems to be a deadlock and the system needs to be rebooted.

Describe how to reproduce the problem

repro.zip

$ gcc repro.c -o repro -lpthread
$ ./repro

You may need to be run the application a few times for the issue to occur. If it's not happening, you can also replace for(int i=0;i<10;i++){ with for(;;){ and it should occur more predictably.

Include any warning/errors/backtraces from the system logs

The application hangs and after a few minutes, the following can be seen in the dmesg output:
dmesg.log

An attempt to read the file writer.chk hangs as well.

@shaan1337 shaan1337 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 28, 2021
@shaan1337
Copy link
Contributor Author

Issue also happens with latest ZFS version (zfs-2.1.1-1/zfs-kmod-2.1.1-1)

@shaan1337
Copy link
Contributor Author

I've turned off WBT on the disks as suggested by @rincebrain in #12662 and the deadlock still occurs

@stale
Copy link

stale bot commented Nov 22, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Nov 22, 2022
@stale stale bot closed this as completed Mar 19, 2023
@behlendorf behlendorf added Bot: Not Stale Override for the stale bot and removed Status: Stale No recent activity for issue labels Mar 28, 2023
@behlendorf
Copy link
Contributor

Reopening until we can verify this has been resolved with the provided reproducer.

@behlendorf behlendorf reopened this Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants