Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't unmount and re-mount in same process #4

Open
hayley-leblanc opened this issue Apr 21, 2021 · 11 comments
Open

Can't unmount and re-mount in same process #4

hayley-leblanc opened this issue Apr 21, 2021 · 11 comments

Comments

@hayley-leblanc
Copy link

Hi,

I am actually having this issue with Strata, but I think it is in Assise as well. I have a program that attempts to perform the following set of steps two times:

  1. Initialize a new instance of Strata (calls mkfs on two emulated PM devices, run a command to set up kernfs, run init_fs to start the libfs)
  2. Create, fsync, and close a file in Strata
  3. Unmount Strata by calling shutdown_fs() and killing the kernfs process

In the first iteration, everything works as expected. The second time, when I get to step 2, the process is just killed. It appears to be killed while Strata is trying to open the file, because none of my error handling code after the open() call runs. Strata doesn't print any error messages.

I noticed that the LibFS only initializes the file system in init_fs() if a variable initialized is 0. init_fs() sets this variable to 1, but shutdown_fs() doesn't set it back to 0. Is this intentional? I added a line in shutdown_fs() so that initalized is set to 0 when the system is shut down, and things started working as expected.

Also - is there a way to shut down kernfs cleanly from an external process? I see that it has a shutdown_fs() function but I don't immediately see a way to invoke it externally, and I'd like to be able to umount kernfs after running arbitrary workloads.

Thanks!

@simpeter
Copy link

simpeter commented Apr 22, 2021 via email

@wreda
Copy link
Contributor

wreda commented Apr 23, 2021

This is indeed a bug. Thanks for pointing it out. I'll fix this in an upcoming patch.

We didn't get around to implementing mount/umount, so Simon's suggestion is sensible here. If you manage to come up with an implementation for these commands, I'd be more than happy to integrate it.

@hayley-leblanc
Copy link
Author

Thanks!

I have another question about Strata, if that's alright. I'm currently working on a tool to test PM file systems for crash consistency and we are currently extending it to Strata. I've encountered some unexpected behavior and I'd like to see if it's correct or not. I have been able to trigger it without this too with the following steps.

Here's what I'm doing:

  1. Set up strata and run a program that calls init_fs() to start libfs, then creates a file /mlfs/foo. This program does not call shutdown_fs().
  2. Kill kernfs (without a TERM handler set up, so it doesn't run shutdown_fs() on the kernfs side either).
  3. Start kernfs up again.
  4. Run a program that starts libfs and attempts to stat /mlfs/foo.

By ending the first program without shutdown_fs() and by killing kernfs, I think this set of steps essentially injects a power-loss crash or similar after the creation of /mlfs/foo. In the second program, the stat call on /mlfs/foo fails and returns -2. errno is 0 after the stat call. If I try to open /mlfs/foo instead of calling stat, the same thing happens (returns -2, errno is 0 after the call) and Strata prints "incorrect fd -2: file /mlfs/foo".

Since Strata is synchronous, I would expect /mlfs/foo to be present in the second program even though libfs and kernfs don't shut down correctly. Is that correct?

@simpeter
Copy link

simpeter commented Apr 24, 2021 via email

@hayley-leblanc
Copy link
Author

Thanks for the update.

We'd like to try to test the parts of the crash recovery code that have been implemented - could you point us towards what those might be?

Thanks!

@simpeter
Copy link

simpeter commented May 1, 2021 via email

@simpeter
Copy link

simpeter commented May 2, 2021 via email

@hayley-leblanc
Copy link
Author

Thanks! I think the scenario we're trying to set up is most similar to the OS failover experiment described in the paper; we're basically simulating power-loss crashes. I'd love some info on how you set up and ran that experiment. We are not looking at distributed file systems at the moment, so I have been working with Strata so far; does Strata have the same recovery mechanisms implemented as Assise, or should I switch to a local-only instance of Assise?

Thanks again for your help!

@simpeter
Copy link

simpeter commented May 8, 2021 via email

@simpeter
Copy link

simpeter commented May 10, 2021 via email

@hayley-leblanc
Copy link
Author

Awesome, thank you! I'll try getting Assise set up and replicating that experiment. I'll reach out if I have any problems. I'll leave this issue open for now until the original bug I reported is patched. Thanks again for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants