Can't unmount and re-mount in same process #4

hayley-leblanc · 2021-04-21T20:49:19Z

Hi,

I am actually having this issue with Strata, but I think it is in Assise as well. I have a program that attempts to perform the following set of steps two times:

Initialize a new instance of Strata (calls mkfs on two emulated PM devices, run a command to set up kernfs, run init_fs to start the libfs)
Create, fsync, and close a file in Strata
Unmount Strata by calling shutdown_fs() and killing the kernfs process

In the first iteration, everything works as expected. The second time, when I get to step 2, the process is just killed. It appears to be killed while Strata is trying to open the file, because none of my error handling code after the open() call runs. Strata doesn't print any error messages.

I noticed that the LibFS only initializes the file system in init_fs() if a variable initialized is 0. init_fs() sets this variable to 1, but shutdown_fs() doesn't set it back to 0. Is this intentional? I added a line in shutdown_fs() so that initalized is set to 0 when the system is shut down, and things started working as expected.

Also - is there a way to shut down kernfs cleanly from an external process? I see that it has a shutdown_fs() function but I don't immediately see a way to invoke it externally, and I'd like to be able to umount kernfs after running arbitrary workloads.

Thanks!

The text was updated successfully, but these errors were encountered:

simpeter · 2021-04-22T21:43:16Z

(cc Waleed) I presume that shutdown_fs not setting initialized back to 0 is a bug. There likely currently is no way to cleanly shut down kernfs from an external process, but you should be able to add a TERM signal handler that does a clean shutdown. That should allow you to shut it down cleanly by sending SIGTERM.

…

-- Simon

On Wed, Apr 21, 2021 at 3:49 PM hayley-leblanc ***@***.***> wrote: Hi, I am actually having this issue with Strata, but I think it is in Assise as well. I have a program that attempts to perform the following set of steps two times: 1. Initialize a new instance of Strata (calls mkfs on two emulated PM devices, run a command to set up kernfs, run init_fs to start the libfs) 2. Create, fsync, and close a file in Strata 3. Unmount Strata by calling shutdown_fs() and killing the kernfs process In the first iteration, everything works as expected. The second time, when I get to step 2, the process is just killed. It appears to be killed while Strata is trying to open the file, because none of my error handling code after the open() call runs. Strata doesn't print any error messages. I noticed that the LibFS only initializes the file system in init_fs() if a variable initialized is 0. init_fs() sets this variable to 1, but shutdown_fs() doesn't set it back to 0. Is this intentional? I added a line in shutdown_fs() so that initalized is set to 0 when the system is shut down, and things started working as expected. Also - is there a way to shut down kernfs cleanly from an external process? I see that it has a shutdown_fs() function but I don't immediately see a way to invoke it externally, and I'd like to be able to umount kernfs after running arbitrary workloads. Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHQBMXCZOOKBILRQQCIZWDTJ426HANCNFSM43LDR24A> .

wreda · 2021-04-23T14:46:30Z

This is indeed a bug. Thanks for pointing it out. I'll fix this in an upcoming patch.

We didn't get around to implementing mount/umount, so Simon's suggestion is sensible here. If you manage to come up with an implementation for these commands, I'd be more than happy to integrate it.

hayley-leblanc · 2021-04-23T19:30:29Z

Thanks!

I have another question about Strata, if that's alright. I'm currently working on a tool to test PM file systems for crash consistency and we are currently extending it to Strata. I've encountered some unexpected behavior and I'd like to see if it's correct or not. I have been able to trigger it without this too with the following steps.

Here's what I'm doing:

Set up strata and run a program that calls init_fs() to start libfs, then creates a file /mlfs/foo. This program does not call shutdown_fs().
Kill kernfs (without a TERM handler set up, so it doesn't run shutdown_fs() on the kernfs side either).
Start kernfs up again.
Run a program that starts libfs and attempts to stat /mlfs/foo.

By ending the first program without shutdown_fs() and by killing kernfs, I think this set of steps essentially injects a power-loss crash or similar after the creation of /mlfs/foo. In the second program, the stat call on /mlfs/foo fails and returns -2. errno is 0 after the stat call. If I try to open /mlfs/foo instead of calling stat, the same thing happens (returns -2, errno is 0 after the call) and Strata prints "incorrect fd -2: file /mlfs/foo".

Since Strata is synchronous, I would expect /mlfs/foo to be present in the second program even though libfs and kernfs don't shut down correctly. Is that correct?

simpeter · 2021-04-24T01:50:42Z

Your assumption is correct. What I think should happen is that each libfs gets linked somewhere in persistent file system state (likely the superblock) and that, each time kernfs starts, kernfs first replays any log contents from the set of previously open libfs update logs, as identified by the superblock. My bet is that it's not fully implemented. Henry (cc'ed) did experiments that should involve these or similar steps. He might have some pointers for you as to how to get the proper behavior.

…

On Fri, Apr 23, 2021 at 2:31 PM hayley-leblanc ***@***.***> wrote: Thanks! I have another question about Strata, if that's alright. I'm currently working on a tool to test PM file systems for crash consistency and we are currently extending it to Strata. I've encountered some unexpected behavior and I'd like to see if it's correct or not. I have been able to trigger it without this too with the following steps. Here's what I'm doing: 1. Set up strata and run a program that calls init_fs() to start libfs, then creates a file /mlfs/foo. This program does *not* call shutdown_fs(). 2. Kill kernfs (without a TERM handler set up, so it doesn't run shutdown_fs() on the kernfs side either). 3. Start kernfs up again. 4. Run a program that starts libfs and attempts to stat /mlfs/foo. By ending the first program without shutdown_fs() and by killing kernfs, I think this set of steps essentially injects a power-loss crash or similar after the creation of /mlfs/foo. In the second program, the stat call on /mlfs/foo fails and returns -2. errno is 0 after the stat call. If I try to open /mlfs/foo instead of calling stat, the same thing happens (returns -2, errno is 0 after the call) and Strata prints "incorrect fd -2: file /mlfs/foo". Since Strata is synchronous, I would expect /mlfs/foo to be present in the second program even though libfs and kernfs don't shut down correctly. Is that correct? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHQBMSPMAUFSGTZMJS6N7TTKHDGRANCNFSM43LDR24A> .

hayley-leblanc · 2021-04-30T14:05:21Z

Thanks for the update.

We'd like to try to test the parts of the crash recovery code that have been implemented - could you point us towards what those might be?

Thanks!

simpeter · 2021-05-01T01:03:57Z

cc Henry and Waleed, who should be able to tell you.

…

On Fri, Apr 30, 2021 at 9:06 AM hayley-leblanc ***@***.***> wrote: Thanks for the update. We'd like to try to test the parts of the crash recovery code that have been implemented - could you point us towards what those might be? Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHQBMTUOKZ76T6ZIZUDHVTTLK2LRANCNFSM43LDR24A> .

simpeter · 2021-05-02T17:26:28Z

Hi all - hopefully this email response will propagate through to GitHub... The Assise artifact doesn’t include a generalized implementation of recovery/reconfiguration; we instead set up specific experiments to demonstrate the scenarios in the paper. If there is a specific scenario you’re interested in studying, I am happy to provide input on how to set it up. Best, Henry

…

On Apr 30, 2021, at 6:03 PM, Simon Peter ***@***.*** ***@***.***>> wrote: cc Henry and Waleed, who should be able to tell you. On Fri, Apr 30, 2021 at 9:06 AM hayley-leblanc ***@***.*** ***@***.***>> wrote: Thanks for the update. We'd like to try to test the parts of the crash recovery code that have been implemented - could you point us towards what those might be? Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHQBMTUOKZ76T6ZIZUDHVTTLK2LRANCNFSM43LDR24A>.

hayley-leblanc · 2021-05-07T16:56:49Z

Thanks! I think the scenario we're trying to set up is most similar to the OS failover experiment described in the paper; we're basically simulating power-loss crashes. I'd love some info on how you set up and ran that experiment. We are not looking at distributed file systems at the moment, so I have been working with Strata so far; does Strata have the same recovery mechanisms implemented as Assise, or should I switch to a local-only instance of Assise?

Thanks again for your help!

simpeter · 2021-05-08T17:09:39Z

Assise will be the better choice. Henry or Waleed will be able to help you with setup.

…

On Fri, May 7, 2021 at 11:57 AM hayley-leblanc ***@***.***> wrote: Thanks! I think the scenario we're trying to set up is most similar to the OS failover experiment described in the paper; we're basically simulating power-loss crashes. I'd love some info on how you set up and ran that experiment. We are not looking at distributed file systems at the moment, so I have been working with Strata so far; does Strata have the same recovery mechanisms implemented as Assise, or should I switch to a local-only instance of Assise? Thanks again for your help! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHQBMUIGBIOHP43KTGJA5TTMQLWJANCNFSM43LDR24A> .

simpeter · 2021-05-10T17:58:34Z

I agree that in general, Assise would be the best choice here, given the range of fixes since the Strata’s release. Just to be clear - the Strata release never supported replaying logs after a crash, and Assise’s support is limited to my testing for the OS failover experiment. Other workloads may be buggy. Unlike Strata, Assise’s distributed failure scenario, where another replica takes over the workload, requires throwing out any undigested logs during recovery instead of replaying the logs. That said, here’s how I set up the single-node/OS failover experiment. SharedFS and the app/LibFS process are killed, then both restarted. When the processes restart, SharedFS digests any old log entries which weren’t digested before failure. To measure this log recovery time and subsequently start the app workload, I added a synchronous digest request to LibFS’s init_log (log.c). This request uses the (old, crashed) log’s start_digest and n_digest from the log superblock to digest any remaining log entries from the crashed process. This takes place before init_log() clears the log superblock for the new process. Note that to support recovery with multiple processes, this logic should be moved from init_log() to SharedFS, which would check all LibFS logs, before allowing any LibFS to finish init_fs().

…

On May 8, 2021, at 10:09 AM, Simon Peter ***@***.***> wrote: Assise will be the better choice. Henry or Waleed will be able to help you with setup. On Fri, May 7, 2021 at 11:57 AM hayley-leblanc ***@***.*** ***@***.***>> wrote: Thanks! I think the scenario we're trying to set up is most similar to the OS failover experiment described in the paper; we're basically simulating power-loss crashes. I'd love some info on how you set up and ran that experiment. We are not looking at distributed file systems at the moment, so I have been working with Strata so far; does Strata have the same recovery mechanisms implemented as Assise, or should I switch to a local-only instance of Assise? Thanks again for your help! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHQBMUIGBIOHP43KTGJA5TTMQLWJANCNFSM43LDR24A>.

hayley-leblanc · 2021-05-11T15:29:02Z

Awesome, thank you! I'll try getting Assise set up and replicating that experiment. I'll reach out if I have any problems. I'll leave this issue open for now until the original bug I reported is patched. Thanks again for all your help!

simpeter mentioned this issue Nov 18, 2021

Segmentation fault when leases are enabled #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't unmount and re-mount in same process #4

Can't unmount and re-mount in same process #4

hayley-leblanc commented Apr 21, 2021

simpeter commented Apr 22, 2021 via email

wreda commented Apr 23, 2021

hayley-leblanc commented Apr 23, 2021

simpeter commented Apr 24, 2021 via email

hayley-leblanc commented Apr 30, 2021

simpeter commented May 1, 2021 via email

simpeter commented May 2, 2021 via email

hayley-leblanc commented May 7, 2021

simpeter commented May 8, 2021 via email

simpeter commented May 10, 2021 via email

hayley-leblanc commented May 11, 2021

Can't unmount and re-mount in same process #4

Can't unmount and re-mount in same process #4

Comments

hayley-leblanc commented Apr 21, 2021

simpeter commented Apr 22, 2021 via email

wreda commented Apr 23, 2021

hayley-leblanc commented Apr 23, 2021

simpeter commented Apr 24, 2021 via email

hayley-leblanc commented Apr 30, 2021

simpeter commented May 1, 2021 via email

simpeter commented May 2, 2021 via email

hayley-leblanc commented May 7, 2021

simpeter commented May 8, 2021 via email

simpeter commented May 10, 2021 via email

hayley-leblanc commented May 11, 2021