-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): protect MFS root in node with lock #8447
base: master
Are you sure you want to change the base?
Conversation
Assigned to @petar for review as this is a newer area for all the current go-ipfs stewards. |
core/commands/files.go
Outdated
@@ -792,20 +795,25 @@ stat' on the file or any of its ancestors. | |||
if err != nil { | |||
return err | |||
} | |||
filesRoot := nd.LockedFilesRoot.LockRoot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann This design takes a high-level lock on all of MFS while running cli commands or the GC. On the one hand, this approach is a little error-prone because we need to make sure that this lock is used in all app-level code that touches MFS. On the other hand, after doing some thinking, I don't think it is easy to implement per-node locking in MFS. So, for now, it seems that the approach in this PR is fine for fixing the issue in the short term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a high level lock here is unfortunate, but isn't really any different than locking around Pin operations. It also seems fairly straightforward to implement.
In theory we could leverage the existence of the flush command here to allow accumulating data that's protected from GC, but given other work we'd like to do around MFS and GC (independently and together) it doesn't seem worth the effort at the moment.
@aschmahmann From your previous comment I take it you approve the direction of this PR but could you make it explicit here just to make sure I'm on the right track before proceeding, please? |
@schomatis yes, having a global write lock on MFS seems like the best thing to do in this PR. For reads we might want to be more careful than a global lock though. For example, not locking If we find ourselves stuck with locking on some of the read commands I'd really like |
Continuing with this. |
86c53a0
to
3939682
Compare
Confirming this is already the case since the |
005f234
to
f6e51ac
Compare
f6e51ac
to
d0e0595
Compare
// given we would be exposing an unlocked version of the MFS root) | ||
// * remove it completely (breaking interface) in favor of LockedFilesRoot; | ||
// this would seem the cleaner option | ||
// * some in-between like renaming LockedFilesRoot to the original FilesRoot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go for this option, because this will help someone using the new version of the code figure out what is going on. It does not look like there is any danger of old incorrect code compiling quietly.
@@ -168,3 +169,54 @@ func Files(mctx helpers.MetricsCtx, lc fx.Lifecycle, repo repo.Repo, dag format. | |||
|
|||
return root, err | |||
} | |||
|
|||
// FIXME(BLOCKING): This should probably be merged into the Files provider above | |||
// as no one should have access to the unlocked version of the mfs.Root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
// distinction between reading and writing. | ||
// FIXME(BLOCKING): Consider providing our own API functions like GetNode (and similar) that | ||
// already take charge of lock-handling here instead of leaving that burden to the consumer. | ||
// That way we could also make sure the lock is being enforced correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave things exactly as you have them right now. There are bigger issues with MFS locking that cannot be resolved by any sort of locking on the root alone. This PR is trying to just fix races between GC and competing command-line MFS operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the interface of LockedFilesRoot, because having only RLock and Lock methods forces whoever is touching this code to think/ask what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, this is actually a non-blocking fixme, just a note of a possible direction down the road, will clarify that.
// doing a Lookup, (a) allows the Flush operation and (b) its File/Directory structures | ||
// implementing it have their own locking mechanism to keep track of open file | ||
// descriptors. | ||
func (lfr *LockedFilesRoot) RLock() *mfs.Root { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the best we can do for now without a total redesign of MFS, I think.
@@ -50,7 +50,11 @@ type ipfsPinMFSNode struct { | |||
} | |||
|
|||
func (x *ipfsPinMFSNode) RootNode() (ipld.Node, error) { | |||
return x.node.FilesRoot.GetDirectory().GetNode() | |||
// FIXME(BLOCKING): We're deploying this lock taking mechanism as simply as | |||
// possible for now deferring the unlock but we should be more precise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See/Similar to comments below. That's the best we can do now. Even though this code has no effect at all (because the root cannot change inside LockedFilesRoot), it still seems useful to make someone staring at the code to be puzzled and investigate what this lock is all about.
ipfs --version | ||
ipfs files mkdir /test-lock/ | ||
ipfs files rm /test-lock/ -r | ||
((echo "content" | ./cmd/ipfs/ipfs files write --create --parents --truncate --lock-time 3 /test-lock/file) && echo "ipfs write lock released" &) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way you've implemented the timeout the command ipfs files write
does not exit until the timeout expires, consequently the lock is never present for the next command ipfs repo gc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the command ipfs files write does not exit until the timeout expires,
Correct. My bash is very poor but the idea was to have the above command running in the background while ipfs repo gc
attempts to take the lock and has to wait until ipfs write
releases it, where then I see both command's outputs roughly simultaneously.
But anyway, this is just a temporary ad-hoc test while in the draft stage; I still don't know how to actually test this in production to get this code ready to merge.
@aschmahmann Petar confirmed the current proposal but I'm still stuck on how to test it other than some manual confirmation than the lock is being taken. How would you like to test this to wrap it up? |
Depends on #8574. Please review and land that first to simplify this diff.
Replace the readily accessible MFS root in the node (
FilesRoot
) with a lock-protected version (LockedFilesRoot
).Testing: We currently only do a basic test manually (through the temporarily added script
test-mfs-lock.bash
), but we should look for a robust way to test the lock (preferably not through sharness/bash but through the internal Go API).Note that this entire logic applies only when having a daemon running where the different GC and files commands contend for the same MFS root. If the commands are run in standalone mode then each one will contend for the entire repo's lock (and this mechanism isn't relevant there).
TODO:
FIXME
s.defer
always.Closes #6113.