Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail: Refactor paths::Input::from and Settings::inputs #4756

Merged
merged 3 commits into from
Jul 3, 2023

Conversation

Joining7943
Copy link
Contributor

This pr refactors mainly the Input::from method and changes inputs in Settings from a VecDeque to a Vec.

There is no need for Settings::inputs being a VecDeque anymore and Vec is a little bit more efficient. In Input::from, unpacking AsRef<OsString> with .as_ref() just once is also a bit more efficient. Same with the comparison of OsStr with another OsStr instead of Path. Parsing the input arguments to a String and then back to an OsStr is unnecessary, since we can get the inputs raw as OsStr from clap to pass them to Input::from.

Note this pr also includes a small fix. When file headers are printed in verbose mode, the output for stdin should be equal on all platforms: ==> standard input <==. Before this change the output for non unix systems was ==> - <==.

@Joining7943 Joining7943 changed the title 'tail': Refactor paths::Input::from and Settings::inputs` tail: Refactor paths::Input::from and Settings::inputs Apr 20, 2023
.map(|v| v.map(|string| Input::from(&string)).collect())
.unwrap_or_default();
settings.inputs = matches
.get_raw(options::ARG_FILES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this without get_raw? Maybe with an OsStr or Path value parser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the disadvantage of get_raw?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure there's a real disadvantage, but it feels like a workaround. Something that should only be necessary if you have parsed values and want to access the raw value as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's meant as the direct way to access the parsed values as OsStr ;) imho value parsing to get an OsStr from clap is just a waste of resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be surprised if it actually did any parsing. It's just gonna expect an OsStr, which always succeeds and then can give us an OsStr as a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do it the easy and direct way, please? From the docs of clap get_raw

Iterate over the original argument values.

With get_raw, there's nothing else between us and the OsStr, neither during compile time nor runtime and the code is clearly expressing that intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That intent is exactly what an OsStr value parser expresses and it does it i n the clap builder, instead of at this location. It is in my opinion the most easy and direct way, because it uses claps normal parsing.

Besides: clap-rs/clap#3809

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That clap issue is open waiting for decision since last year, so I think we can safely use it. Can we settle this please? If you insist I'll change it somehow, but I really don't think it's worth it.

Copy link
Member

@tertsdiepraam tertsdiepraam Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. Let's hope I can make a better API than clap's if I ever find time to work on my own parser again :)

@Joining7943
Copy link
Contributor Author

Joining7943 commented Apr 21, 2023

I wasn't very happy with the comparison with a new OsStr in Input::from every time this function is called, so I'm going to change it asap

UPDATE: It's changed now

@Joining7943
Copy link
Contributor Author

@tertsdiepraam Just a quick question by the way. In the text module every variable is static. What is the advantage of using static over const in tail. Having such variables accessible in const context would be nice sometimes.

@tertsdiepraam
Copy link
Member

I usually prefer const too. I think it's mostly legacy reasons that there's a lot of static for global constants in the codebase.

@Joining7943
Copy link
Contributor Author

I usually prefer const too. I think it's mostly legacy reasons that there's a lot of static for global constants in the codebase.

Ok, then let's change them to const.

@uutils uutils deleted a comment from github-actions bot Apr 22, 2023
#[cfg(unix)]
impl From<&OsStr> for InputKind {
fn from(value: &OsStr) -> Self {
const DASH: [u8; 1] = [b'-'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I think this can be a b"-"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it's just a single character. It doesn't play a big role here but the result is an array [u8; 1] instead of a reference to an array &[u8; 1].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't think

const DASH: &[u8] = b"-";

if value.as_bytes() == DASH { }

is clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not really, to me both look very similar and equally clear. However, the correct type of a byte string literal is &[u8; n] or better &'static [u8; n]. I first used b"-" but then changed it to [b'-'] to spare me the reference here and I'm sure that DASH is stored on the stack. We're talking here about a single character. If you have a longer string literal and just need it as a (reference to a) byte array than you might spare yourself a lot of typing by using a byte string literal.

Copy link
Member

@tertsdiepraam tertsdiepraam Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I'd actually prefer this:

let DASH = OsStr::new("-");
if value == DASH { }

Then we don't even need to think about byte strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're worried about performance, they all generate the same assembly: https://godbolt.org/z/M66sjEqGe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OsStr::new can't be const. Why can't we just keep that little optimization the way it is and noone needs to look at any assembly code and rely on compiler magic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my priority here is clear code. I'd still push for the change even if it was technically slower, because it's not worth optimizing. The current code harder to read without any advantage.

Copy link

@jdonszelmann jdonszelmann Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside observer looking through some PRs: I think OsStr is the nicer option here, it's clearer what's going on. [b'-'] is hard to read. And the performance seems to be the same anyway even though if the performance were different, it would not matter.

@Joining7943
Copy link
Contributor Author

@tertsdiepraam Sorry, these kind of code reviews lately are boring me. Take this pr or leave it. If you decide for the first you should notice some improvements in code quality.

@tertsdiepraam
Copy link
Member

Sorry, these kind of code reviews lately are boring me.

I'm sorry to hear that. I'll take a look at the PR and thanks for your work on this in any case!

@Joining7943 Joining7943 mentioned this pull request Jun 21, 2023
@tertsdiepraam
Copy link
Member

tertsdiepraam commented Jul 3, 2023

I've added a small commit with my last nitpicky comments :) @sylvestre or @cakebaker could you review it?

@sylvestre
Copy link
Contributor

LGTM, thanks

@sylvestre sylvestre merged commit 7a7842b into uutils:main Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants