-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in encfs::FileNode::open(int) #214
Comments
I haven't been able to reproduce this issue. Makes me wonder if it depends on a particular version or FUSE or kernel. Committed an additional null check in this path just in case. |
Thank you. Could you please tell me what additional info do I need to help diagnoze this problem? (In addition to a core dump) |
It would be helpful if you can reproduce this without pam and document the steps you took to see the issue. Also, please provide the encfs configuration used. |
I will try to catch a core dump next time, both with pam and without pam. (Between Oct 1-7, on which I will have holidays) I am terribly sorry for having you been waiting so long. |
I reinstalled the old version, and produced a core dump. |
This is a transcript of the disassembly: ; encfs::FileNode::open(int) const ()
push rbp
push rbx
mov rbx,rdi
mov ebp esi
call pthread_mutex_lock@plt
mov rdi,QWORD PTR [rbx+0x38]
mov esi,ebp
mov rax,QWORD PTR [rdi] ; <== crash here, rdi=0x00000000
call QWORD PTR [rax+0x38]
mov rdi,rbx
mov ebp,eax
call pthread_mutex_unlock@plt
add rsp,0x8
mov eax,ebp
pop rbx
pop rbp
ret
mov rbp,rax
mov rdi,rbx
call pthread_mutex_unlock@plt
mov rdi,rbp
call _Unwind_Resume@plt This is the traceback of the crashing thread: encfs::FileNode::open(int) const (); // from libencfs.so.1.9
encfs::_do_flush(encfs::FileNode*) (); // from libencfs.so.1.9
?? () // from libencfs.so.1.9
?? () // from libencfs.so.1.9
encfs::encfs_flush(char const*, fuse_file_info*) (); // from libencfs.so.1.9
?? () // from libfuse.so.2
// some more from libfuse.so.2 |
Some clue was that, I was starting some apps, which writes something into I contacted ArchLinux packager, who said starting from 1.9, |
Rebuilt. Confirmed But this time, the traceback changed, the crashing function changed to "read":
I examined
It seems that this is an invalid 139 if (fi != nullptr && fi->fh != 0)
140 res = do_op(reinterpret_cast<FileNode *>(fi->fh));
141 else
142 res = do_op(FSRoot->lookupNode(path, opName).get()); So this 530 std::shared_ptr<FileNode> fnode =
531 FSRoot->openNode(path, "open", file->flags, &res);
...
538 file->fh =
539 reinterpret_cast<uintptr_t>(ctx->putNode(path, std::move(fnode))); This change was introduced in commit af64702, which happened on April this year, right before 1.9.1 was released. First, I doubt if it is safe to use rvalue-ref ( Since it is related to my data integrity. I don't want to lose anything. I didn't try to modify the code blindly and see if it works. I want to hear your instructions. I will later use |
It seems that valgrind does not work with suid programs (e.g. fuse).
I found it difficult to set up a isolated sandbox environment. It seems high load (e.g. the startup of KDE) is required to trigger this bug. Maybe a virtual machine with full KDE installed is a possible method to do experiments. I will downgrade from |
I'm also using ArchLinux and pam_encfs and having this crash. encfs complains a lot about "getattr error: No such file or directory" before it crashes. |
Try the package It's hard to tell whether it is a "pam_encfs" problem yet -- because I can not make a test without this module: that will break my display manager. |
So, I have been attempting to create this segfault using pam_encfs. Using a test copy of my own kde config I got a segfault in Next I tried to get this failing on an encfs instance outside of pam_encfs, so far I have not been able to do. However what I did notice was a hang waiting on the logging output to the terminal. While I cannot pinpoint any particular error, I believe there is a race and could possibly be in the logging system. |
I have the same problem with my MacBook Pro (2016) with OSX Sierra. Last line with "encfs -v" 2017-01-02 20:26:10,553 VER [encfs.cpp:128] op: flush : The problem occurs if I enable spotlight indexing in the encrypted folder. CrashReport attached. |
I Don't know if this is the same issue (well, seems to be ...), but I am attaching coredump info from my computer. I am using Arch distro with: Thank you for dealing with the issue. |
@t-dan |
#298 merged, issue should be solved 👍 |
@m13253 would you test master branch and close this issue if solved ? |
Tested. Not solved. Encfs crashes after 2 minutes I logged in to Gnome when I tried to open Chrome browser.
|
Thx for your test, sorry for its result... |
I couldn't get your point, what should I do now? |
We should try to make an encfs test version which does not include Easylogging++ library ( |
Please test this version : |
Thank you for your patch. But I'm sorry I just messed up my Linux installation and had to reinstall. So I need to squeeze up some time to rebuild my environment and reproduce this bug. Please wait for me for some days. Then I'll test your patch. Thank you 👍 |
Good idea. However, I tried this, and libfuse prevents an open file from being unlinked (overwritten by the rename) by renaming it to ".fuse_hiddenXXX":
|
I added a canary value to FileNode and I can now reproduce the issue in seconds. Commit is: rfjakob@1021593 Running fsstress-encfs from the gocryptfs test suite I get in syslog:
Unless my canary is buggy, this should give us some leverage. |
@rfjakob Nice! Reproduction is a key in solving this. Should you also set |
@DominikChmiel Can you test if rfjakob@1021593 fixes the crash your were seeing and/or the |
@rfjakob ran the build process that previously had a good chance of crashing encfs with your version 5x now. No crash so far, but the following in syslog:
|
Ok, excellent, we are on the right track. Thanks for testing! |
Adds a uint32 value to FileNode that is initialized to an arbitrary value (CANARY_OK) in the constructor, and reset when the reference is dropped (CANARY_RELEASED, CANARY_DESTROYED). The canary is checked on each withFileNode call. Makes it much easier to trigger the bug seen in #214 .
@DominikChmiel I finished a proper fix yesterday, could you
and try again? The canary errors should be gone now, as well. |
Thanks for your work @rfjakob , will test it when I'm at my productive setup tomorrow. |
@rfjakob No Crash + no canary error messages, but the following:
Checked with commit rfjakob/encfs-next@bce3cee (branch issue214 from your repo) |
Ok, thanks, these messages were expected. I should proba bly downgrade them to warnings. The build completes successfully, right? |
Really strange to have |
Just installed that version again to recheck: As for the build of encfs: The build itself runs fine, however
After remounting with -o direct_io these I/O errors + entries in syslog persist. |
Run |
Seems to fail during a sanity-check of the resulting binary:
https://github.com/rfjakob/encfs/blob/issue214/tests/normal.t.pl#L312 |
Seems to me that this is due to the hardcoded path of ./build/encfs. I'm currently using the folder layout of the arch-repo PKGBUILD: https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/encfs |
Okay, just checked by going through the build process manually as detailed in INSTALL.md. Tests pass fine now:
Was a folder layout error on my part. |
Thanks for testing, dominik. That your build system receives i/o errors means that fixing the crashes was only half of the story. I'll try reproducing with something like a linux kernel build. |
Went back to rfjakob/encfs-next@c8ff1f9 (where you had just added the canary) because I wasn't sure about the build-result with that version. Can confirm the I/O Errors were already present there, my bad for missing that earlier. If there are any further tests I can do to help you just let me know. |
I can reproduce the I/O error using a Linux kernel build. I think I found the root cause: Concurrent opens of the same file can result in two FileNodes for the same path. Commit af64702 does not take that into account. Excerpt from a debug log:
Patch is in the works but not ready to publish yet. |
What about simply reverting this commit ? |
Unfortunately does not revert cleanly |
@DominikChmiel I have pushed v2 of the patch series:
Kernel build runs A-OK for me now. Let's hope you find the same. |
Ran 5 rebuilds, no I/O errors, no syslog messages, no crash, good build result. Looks like that fixed it to me. Thanks again @rfjakob! |
Fix released in EncFS v1.9.2. Thanks to everybody involved! |
Description:
After upgrading to encfs 1.9-2, using pam_encfs to automatically decrypt home directory causes coredump.
I can not confirm whether it is a bug of encfs or a bug of pam_encfs.
Log attached in log.txt.
I reproduced it 3 times. But sorry I did not manage to debug the program, because I can not afford any data loss.
Downgrading to 1.8.1-7 solves the problem.
Additional info:
ArchLinux x86_64
encfs: 1.9-2
pam_encfs: 0.1.4.4-4
Cross post: https://bugs.archlinux.org/task/50789
The text was updated successfully, but these errors were encountered: