Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'disable_lockfile' configuration to disable locking globally #1830

Merged
merged 18 commits into from
Sep 29, 2022

Conversation

danpf
Copy link
Contributor

@danpf danpf commented Aug 2, 2022

The point of this PR is to add a flag to micromamba run that will disable the lockfile creation before running the command.

Reasoning

Dealing with unique filesystem problems is unfortunately all too common in my life and I have recently encountered issues with my parallel commands not running/erroring out due to my server's inability to create many lockfiles at a time. It appears that I am not the only one that has to deal with this #1446.

Looking at the code it seems to be a relatively simple change and is easily tested, but let me know if there's something I should change. I suppose an alternative to this would be #1448 but I think this still has its utility in cases when you know a lockfile is not necessary (like on HPCs.)

Also -- a genuine question -- I understand the reasoning behind using a lockfile when doing install/remove/etc, but I don't understand why a lock is necessary when using run (I assume that when using run the most common use-case is to run a binary in a separate environment) -- Was the command added to help with micromamba ps or were there other scenarios that were envisioned when locking was added to run?

Cheers,
~Danny

@wolfv
Copy link
Member

wolfv commented Aug 2, 2022

Hi @danpf thanks for the PR!

The lockfile is needed to store meta information about the running process in a folder on the computer, yeah.

How about we make a "disable_filelock: true/false" option in .mambarc? So that globally this can be enabled or disabled?

@aghr
Copy link

aghr commented Aug 2, 2022

Dear @danpf and @wolfv,

Such a global flag/option "disable_filelock: true/false" in .mambarc looks like a very good solution. By default it could be set to true. Mamba users could then switch it off on file systems that don't support the file-locking mechanism. Our HPC sys admin deems such a solution preferred over other/more complex solutions.

@wolfv
Copy link
Member

wolfv commented Aug 2, 2022

@danpf this looks almost perfect.
Instead of special casing in the run command, can we change the LockFile to check the value in Context::instance().disable_lockfile? That way, it would take effect everywhere.

@danpf
Copy link
Contributor Author

danpf commented Aug 2, 2022

@wolfv oh hey ya sorry you're too quick for me I was going to ask if that was the best way to handle this or if we should be special casing in places where locks are created. Juggling a few things but I'll do that soon.

@danpf
Copy link
Contributor Author

danpf commented Aug 4, 2022

Assuming tests pass this should be good.

I left the micromamba run .json "lockfiles" (ScopedProcFile ctor) in as they are just created with ofstream and I figure that is pretty unlikely to fail. but if you think they should be removed as well I can do that too (just 1 line change) (see run.cpp)

@danpf danpf changed the title Add --no-lock flag to micromamba run Add 'disable_lockfile' configuration to disable locking globally Aug 4, 2022
@danpf
Copy link
Contributor Author

danpf commented Aug 24, 2022

@wolfv Let me know if you think this PR needs anything else!

libmamba/src/api/configuration.cpp Outdated Show resolved Hide resolved
.long_description(unindent(R"(
Don't create a lockfile when running mamba commands. Use with caution!
-- Some filesystems don't support File Locking and locks don't always
make sense - like when on an HPC. Default is False (use a lockfile)")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example of what problems may arise when not using locking? What does Mamba use locks for that will now be the responsibility of the user?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what commands are affected by this? If it's only a small selection of commands then it might be worthwhile explicitly listing them here.

Copy link
Contributor

@jonashaag jonashaag Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how much you want to write on this topic you can also extend the new troubleshooting page (#1873) and link to that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will update this once we decide on the name to use

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use lockfiles mainly to lock the package & repodata cache (bad things happen if two processes write to the same file...

We also use it in micromamba run to write some json files to store data about the process in ~/.mamba/proc/...

Copy link

@vicecea vicecea Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also use it in micromamba run to write some json files to store data about the process in ~/.mamba/proc/...

Short question, I have noticed that I can't have more than 1 micromamba run ... invocation running at the same time, spawning multiple ones fail with critical libmamba Another process with name '...' is currently running. - Similarly if a micromamba run ... process get's SIGKILLed, any micromamba run ... afterwards fails with the same error until the zombie ~/.mamba/proc/*.json get's cleaned up.

Will this PR fix these issues or should I open an issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vicecea I think this might be confusing some issues. Running two run commands with the same label is disallowed. I don't think this PR will fix that. If you can open a new issue and describe exactly what is going on, that would be nice!

@@ -1133,7 +1137,7 @@ namespace mamba
try
{
// Don't even log if the file/directory isn't writable by someone or doesnt exists.
if (path::is_writable(path))
if (Context::instance().disable_lockfile || path::is_writable(path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either I lack context or this is wrong, shouldn't it be !disable_lockfile && is_writable? If so, this lacks a test.

Copy link
Contributor Author

@danpf danpf Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To revert this change you would need to modify this function:
https://github.com/danpf/mamba/blob/danpf-nolock/micromamba/src/run.cpp#L105 to not fail if nullptr is returned.

I also think it should not matter if path::is_writable or not if we are not making a lockfile anyway so I think this is correct.

EDIT:
you will also have to modify the assert in ScopedProcFile

If you would like me to modify those functions and remove this change, let me know - it is your choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is now confusing (although I think I get it). Maybe we can return earlier when the lockfile is disable? Or do we even need that check here?

libmamba/src/core/util.cpp Outdated Show resolved Hide resolved
@@ -197,6 +197,7 @@ namespace mamba
bool no_env = false;

std::size_t lock_timeout = 0;
bool disable_lockfile = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should negative this boolean (and option) like here

auto& compile_pyc = config.at("compile_pyc");
subcom->add_flag(
"--pyc,!--no-pyc", compile_pyc.get_cli_config<bool>(), compile_pyc.description());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wolfv WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is what you want I will change it. It's your choice.

@@ -161,6 +161,8 @@ namespace mamba
std::unique_ptr<LockFile> proc_dir_lock = lock_proc_dir())
: location{ proc_dir() / fmt::format("{}.json", getpid()) }
{
if (Context::instance().disable_lockfile)
Copy link
Contributor

@jonashaag jonashaag Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code modified here does not do what you think. IIUC this creates a "pid file" that is used to give a list of running processes (see ~/.mamba/proc/ and micromamba ps)

If CI is green with your changed then this also looks like missing test coverage :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and I originally did not have this line, however on my older hpc I was still getting the error:

mamba run' failed to open/create file...

I cannot explain this, but when I disable the json pid fifles from being created - everything works great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpf is the problem here with the lockfile or generally with writing to the home directory?

We do in fact lock the PID directory while the process is writing out the json file, so it's possible that it's also a lockfile issue.

Would it make more sense to put these checks in the lock_proc_dir() function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked a bit and I think this is somewhat unrelated to the lockfile problem. Probably the user cannot create a new directory / new files under the ~/.mamba/proc directory. Could that be the case? We should handle that more gracefully, but I think it should go in a seperate PR.

@wolfv
Copy link
Member

wolfv commented Sep 15, 2022

Hi @danpf sorry for taking so long to look at this PR. I'd be super happy to merge this. We also have some additional fixes coming for lockfiles (#1193) (there were some logic issues with the assumption that PIDs are globally unique).

With those two PRs in, the next mamba release should be much better in terms of file-locks! :)

@@ -751,6 +751,9 @@ namespace mamba
, m_timeout(timeout)
, m_locked(false)
{
if (!Context::instance().use_lockfiles)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mixes expected failures (because of use_lockfiles) with unexpected ones (exceptions).I think this check should be removed and use before trying to instantiate a LockFile.

return ptr;
}
auto ptr = std::make_unique<LockFile>(path);
return ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function mixes again unexpected failure with expected ones, requiring a duplicated check of use_lockfiles before calling it to disambiguate between eexpected failures and unexpected ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to fix this is to make the constructor private, and have static functions instantiating lockfiles, and checking for the context variable. If the lock fails for any other reason, they should propagate the expection. Or use cpp_expected, and provide the reason for the failure (this way we can mix expected and unexpected, but we don't need duplicated checks).

@@ -104,6 +104,9 @@ namespace mamba

std::unique_ptr<LockFile> lock_proc_dir()
{
if (!Context::instance().use_lockfiles)
return nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should not be required. The problem is that try_lock returns a nullptr if the lock fails, whatever the failure is expected or not.

@JohanMabille JohanMabille force-pushed the danpf-nolock branch 3 times, most recently from 889f2df to bbe0b26 Compare September 29, 2022 15:15
@wolfv wolfv merged commit 3a5cd59 into mamba-org:main Sep 29, 2022
@danpf
Copy link
Contributor Author

danpf commented Oct 4, 2022

Finally got a chance to check this out and I see that it's already done! Just wanted to say thank you to everyone involved for finishing this PR!

@danpf danpf deleted the danpf-nolock branch October 4, 2022 02:51
Hind-M pushed a commit to Hind-M/mamba that referenced this pull request Nov 8, 2022
…ly (mamba-org#1830)

Co-authored-by: Danny <[email protected]>
Co-authored-by: Jonas Haag <[email protected]>
Co-authored-by: Wolf Vollprecht <[email protected]>
Co-authored-by: Johan Mabille <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants