Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the binary's name when calling from_args() #5

Closed
Minoru opened this issue Jul 19, 2021 · 12 comments
Closed

Include the binary's name when calling from_args() #5

Minoru opened this issue Jul 19, 2021 · 12 comments

Comments

@Minoru
Copy link

Minoru commented Jul 19, 2021

from_env() takes the executable's name from the argv, but from_args() expects an argv with the name removed. This discrepancy causes a bit of trouble:

  • this is slightly unexpected (although documented)
  • if one uses from_args(), then bin_name() becomes useless (always returns None)

I think it'd be simpler for everyone involved if from_args() took the same input as from_env() does, i.e. an argv with executable's name.

@blyxxyz
Copy link
Owner

blyxxyz commented Jul 19, 2021

It works this way for a few reasons:

  • For a lot of the time while I was writing it, bin_name() didn't exist at all and the first value from args_os() was just ignored. It didn't make sense to require a value from frorm_args() that would just be ignored. (This is not a good reason to keep it that way after I did reintroduce bin_name(), of course.)
  • I (wrongly) thought that clap and structopt worked that way (but they only do if you enable a certain option).
  • It's annoying to have to remember to write the command name when testing. (This really only matters for lexopt's own tests, other packages' tests don't usually have a reason to create a Parser directly, but I didn't realize that at the time.)
  • No other part of lexopt cares about bin_name.

These are not very strong reasons. You might be right.

But if we change it now, all code that uses from_args() breaks but still compiles. That's a worst case scenario. If we break backward compatibility people should at least get a very clear warning.

Looking back, I wish I hadn't included Parser::bin_name in the original release. It's hard to make good use of it, you e.g. can't give it to the caller when returning an error message, even though it's intended for showing in error messages. It needs a redesign to be useful.

I think that the best way forward is to deprecate bin_name, and in the next breaking release either remove it or overhaul it.

If it's overhauled, from_args(args) can perhaps be changed to something like from_args(bin_name: Option<impl Into<OsString>>, args). Then code that's broken would get a fair warning, and given an iterator you could call either from_args(iter.next(), iter) or from_args(None, iter), neither of which are very awkward.

@Minoru
Copy link
Author

Minoru commented Jul 19, 2021

Okay, this makes sense. I'm one of those few who do pass the name in their tests, and check that it's extracted as it should :) Although it wouldn't be hard for me to extract it manually, of course.

The suggested overhaul looks like a fine solution to me too.

Thanks for the thorough explanation!

@blyxxyz
Copy link
Owner

blyxxyz commented Oct 23, 2021

I'd like to do this today for the next release:

  • Add a from_iter method that expects to see the binary name
  • Change bin_name's return type to &str
  • Deprecate from_args with a message pointing to from_iter

If the name can't be found then "app" is used. If it's invalid unicode it's converted lossily.

This'll mean a little churn for newsboat/newsboat#1845 (lines 234-239 can become args.program_name = parser.bin_name().to_owned()). (cc @theIDinside)

How does it sound?

@theIDinside
Copy link

Sounds fine to me! Changes like that I can adjust to quickly so no problem for me at least.

@Minoru
Copy link
Author

Minoru commented Oct 23, 2021

Change bin_name's return type to &str
If the name can't be found then "app" is used.
If it's invalid unicode it's converted lossily.

I understand that these choices are driven by bin_name() being used in error messages, but it still feels wrong. lexopt does very little for the user, so I don't see why it should do so much with the first argument. Hard-coding a default name seems wrong too; in Newsboat we use an empty string, but I could imagine people using the app's name instead.

I'd make it Option<OsString>, and let the user decide how to handle the corner cases.

@blyxxyz
Copy link
Owner

blyxxyz commented Oct 23, 2021

I do feel conflicted about it. I mainly want to keep it like this because processing an Option<&OsStr> is a pain. You end up with code like parser.bin_name().map(|s| s.to_string_lossy().into_owned()).unwrap_or_else(|| "myapp".into()), which is a lot harder to follow than typical argument processing (value.parse()?, value.into_string()?, value.into()).

I also feel that preserving argv[0]'s encoding is way more niche than doing it for other arguments. I can't immediately think of any programs (in any language) that use it in a way that's incompatible with this. (It's not even a good way to find the current executable.)

But maybe keeping it an Option would be better. I think I won't deprecate from_args() after all, so having it missing wouldn't be as weird, and parser.bin_name().unwrap_or("myapp") is manageable.

@Minoru
Copy link
Author

Minoru commented Oct 23, 2021

It's not even a good way to find the current executable.

Oh! I didn't know. This would have been my only argument in favour of using OsStr/OsString rather than str/String. What's a better way?

parser.bin_name().unwrap_or("myapp") is manageable.

This looks good!

@blyxxyz
Copy link
Owner

blyxxyz commented Oct 23, 2021

What's a better way?

In Rust, std::env::current_exe. How it works depends on the platform, on Linux it inspects /proc/self/exe.

To get the path from argv[0] you may have to resolve it with $PATH or the current directory, and you have to trust the parent process: Unix lets you pass other things into exec, using the command name is basically a convention.

@Minoru
Copy link
Author

Minoru commented Oct 23, 2021

Thanks! I knew that exec*() lets me "fool" the child process, but I didn't know I can get the name from /proc. May I suggest that the doc for bin_name() should point to std::env::current_exe? I might not be the only one who didn't know of its existence!

@theIDinside
Copy link

@blyxxyz Btw, will you notify us when 0.2.0 is live so that I can make the changes as speedily as possible? would greatly appreciate it!

@blyxxyz
Copy link
Owner

blyxxyz commented Oct 23, 2021

I've pushed the changes I want to make to master. I still need to give it another lookover, update the table in the README, check the CI, etcetera, but I hope to make the release later this evening. (I'll leave another comment when I do.)

@blyxxyz
Copy link
Owner

blyxxyz commented Oct 23, 2021

Version 0.2.0 is live.

Thank you for using the crate and discussing all these things! It's been very educational. I think the library is in a better state now.

@blyxxyz blyxxyz closed this as completed Oct 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants