-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rustc: Avoid caching by tracking lint crates and env vars #211
Conversation
49bec54
to
a028856
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice the "spaghetti" that you was referring to
marker_adapter/src/loader.rs
Outdated
let mut list = vec![]; | ||
for entry_str in split_os_str(&env_str, b';') { | ||
if entry_str.is_empty() { | ||
return Err(AdapterError::LintCratesEnvMalformed); | ||
} | ||
|
||
// FIXME: Create issue for lifetimes and fix dropping and pointer decl stuff | ||
list.push(LintCrateInfo { | ||
path: PathBuf::from(entry_str), | ||
}); | ||
} | ||
|
||
Ok(pass) | ||
Ok(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split_os_str(&env_str, b';')
.map(|entry| {
if entry.is_empty() {
return Err(AdapterError::LintCratesEnvMalformed);
}
Ok(LintCrateInfo {
path: entry.into(),
})
})
// turbofish is not needed, and can be deduced from
// the surrounding function return type
.collect::<Result<Vec<_>>()
Also maybe this should use the https://doc.rust-lang.org/stable/std/env/fn.split_paths.html format?
The advantage of using iterators and collect
is that collect
pre-allocates enough heap memory using the size_hint()
from the iterator, although in this case there isn't any known size beforehand, but I generally don't think about this and prefer iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these for loops often just arise, when I draft code and refactor it over and over. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe this should use the doc.rust-lang.org/stable/std/env/fn.split_paths.html format?
Yep, just never knew about that one. I expect that this format will change soon, to also include the lint crate name for #201, but for now we can use that function 👍 And then it would probably be smart to use Serde to serialize and deserialize these entries.
cc: @NathanFrasier since this references #201
// The name is not a valid UTF-8 String. This is not ideal but at the | ||
// same time not problematic enough to throw an error. It just means | ||
// that the lint crate can't be tracked and the driver might reuse | ||
// a cache if only that crate was changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend not to caring about supporting non-utf8 paths. There are a bunch of reasons to do so. First of all, it simplifies the code, and second a lot of tools assume they work with UTF8 paths, for example cargo-metadata crate does so. You may see that it uses Utf8Path[Buf]
for storing all the paths parsed from cargo metadata
.
My first impression when I saw this in cargo-metadata
was "why do you limit the users to UTF8", but then I read the reasoning from camino
crate that implements the Utf8Path[Buf]
, and was convinced that this is the right thing to do. I never in my life had to work with non-UTF8 paths. If one ever happens to work with something like that they are already fucked, and cargo-marker won't be the only tool that doesn't work for them. Can you imagine someone invoking cargo-marker
on a project that uses invalid UTF8 in its code structure? I can't, and thus I propose adopting camino
and releaving this mental complexity from the codebase. My question is wether you would accept a PR with such kind of change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second the notion here, if cargo metadata already assumes utf-8 paths, we should lean on that for simplicity. I think this is the right design decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with moving to UTF-8 based paths. If we document it, it should be a fine restriction to make. Though I would not do this in this PR
// can just throw it in the `files` set. | ||
for lint_crate in lint_crates { | ||
if let Some(name) = lint_crate.path.to_str() { | ||
files.insert(Symbol::intern(name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it tracks the directory recursively. IIRC there was some bug feature in cargo that if your use rerun-if-changed
on a directory in a build script, then only the dir metadata or just the first level of entries will be tracked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've recently talked to someone from a project that always uses the rustc/clippy driver directly. For Marker they would do the same. In such a case, a lint crate might be specified that is not inside the lints directory of the project.
Doing it this way doesn't put any restrictions on the user for little to no cost to us :)
// make "marker-driver --rustc" work like a subcommand that passes further args to "rustc" | ||
// for example `marker-driver --rustc --version` will print the rustc version that marker-driver | ||
// uses | ||
// make "marker_rustc_driver --rustc" work like a subcommand that passes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that previously dashes were used to separate package names. I don't understand why the choice was made to use underscores. To my opinion, cargo package/crates and binary names should have kebab case. This is a common convention. For example, cargo plugins are enforced to use kebab-case, tools like rust-gdb
, rust-gdbui
, rust-lldb
, typos-cli
. It's also confusing that one has to install cargo_marker
, but then it is insalled as cargo-marker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I can't tell you how many times I've typed the kebab case version when I need the underscore version, this seems relatively simple to change if we want to. @xFrednet might have a good reason for changing it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 Yeah, so, the background of this is that clippy uses underscores in their crates (clippy_lints
& clippy_utils
) for library crates. For library crates, this also makes sense IMO, since the use
statement also uses underscores. I also looked at some other crates and there seemed to be no real consensus, which should be used bevy for example also always uses underscores.
Now for the binaries... when I decided on the name, I just reserved the driver
and cargo_marker
name without contemplating that these are binaries. Having underscores everywhere just seemed consistent. That was until I found out that Cargo expects the binary to use a dash instead of an underscore.
So, this basically comes down to, little experience when it comes to actually releasing crates/binaries and only learning about my mistakes after it was too late. I even emailed crates.io about maybe changing the name, but they said that this is not possible.
We could rename the binary in the same way that the cargo_marker
crate is renamed, I'm fine with that. But the crate names on crates.io are sadly stuck now. (The only other option, I see, would be to rename the project, which I don't want to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's unfortunate, but I don't see a better solution than renaming the project then 🤕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one solution for the binaries: Make Marker successful enough that it's actually distributed with rustup instead ^^
I feel like the control flow of the main function/module and the data flow in the function is a bit convoluted. There are several conditions, that can cause the flow to take another route or just terminate early. I'm just to very linear and deterministic code, so this might feel more normal to you :) |
Co-authored-by: Veetaha <[email protected]>
8719953
to
cedc4b6
Compare
Thank you for the review and comments! It's really a big help and I appreciate it! |
I think it's safe to r+ this bors r=Veetaha |
Build succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
This was a fight, it required some refactoring in
marker_adapter
to get the information, which lint crates should actually be tracked.The process of invoking the compiler programmatically is also confusing, IMO. I don't know how anyone should be able to figure this out, without the existence of Clippy as a reference. Even with the reference and my tries to leave helpful comments, it took some time to get back into it.
If anyone is interested in reviewing this, I suggest just taking a look at the
register_tracked_files
function and themarker_adapter
changes. The rest is first class spaghetti code which will hopefully work flawlessly:tm:Testing this is also... a challenge, in the end I had to manually install the driver, open a different project and add a path lint crate. Then I changed the lint crate to confirm that it avoids the caching. Not really recommendable, oh well :)
Closes: #98
Closes: #190