Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustc: Avoid caching by tracking lint crates and env vars #211

Merged
merged 4 commits into from
Aug 6, 2023

Conversation

xFrednet
Copy link
Member

@xFrednet xFrednet commented Aug 4, 2023

This was a fight, it required some refactoring in marker_adapter to get the information, which lint crates should actually be tracked.

The process of invoking the compiler programmatically is also confusing, IMO. I don't know how anyone should be able to figure this out, without the existence of Clippy as a reference. Even with the reference and my tries to leave helpful comments, it took some time to get back into it.

If anyone is interested in reviewing this, I suggest just taking a look at the register_tracked_files function and the marker_adapter changes. The rest is first class spaghetti code which will hopefully work flawlessly:tm:


Testing this is also... a challenge, in the end I had to manually install the driver, open a different project and add a path lint crate. Then I changed the lint crate to confirm that it avoids the caching. Not really recommendable, oh well :)


Closes: #98
Closes: #190

@xFrednet xFrednet added A-marker-adapter Area: Adapter D-rustc-driver Driver: Rustc Driver S-waiting-on-review Status: Awaiting review A-driver Area: Driver or something related to the internal working of a driver. labels Aug 4, 2023
@xFrednet xFrednet added this to the v0.2.0 milestone Aug 4, 2023
Copy link
Contributor

@Veetaha Veetaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice the "spaghetti" that you was referring to

marker_adapter/src/loader.rs Outdated Show resolved Hide resolved
Comment on lines 66 to 77
let mut list = vec![];
for entry_str in split_os_str(&env_str, b';') {
if entry_str.is_empty() {
return Err(AdapterError::LintCratesEnvMalformed);
}

// FIXME: Create issue for lifetimes and fix dropping and pointer decl stuff
list.push(LintCrateInfo {
path: PathBuf::from(entry_str),
});
}

Ok(pass)
Ok(list)
Copy link
Contributor

@Veetaha Veetaha Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_os_str(&env_str, b';')
    .map(|entry| {
        if entry.is_empty() {
            return Err(AdapterError::LintCratesEnvMalformed);
        }
        
        Ok(LintCrateInfo {
            path: entry.into(),
        })
    })
    // turbofish is not needed, and can be deduced from
    // the surrounding function return type
    .collect::<Result<Vec<_>>()

Also maybe this should use the https://doc.rust-lang.org/stable/std/env/fn.split_paths.html format?

The advantage of using iterators and collect is that collect pre-allocates enough heap memory using the size_hint() from the iterator, although in this case there isn't any known size beforehand, but I generally don't think about this and prefer iterators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these for loops often just arise, when I draft code and refactor it over and over. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe this should use the doc.rust-lang.org/stable/std/env/fn.split_paths.html format?

Yep, just never knew about that one. I expect that this format will change soon, to also include the lint crate name for #201, but for now we can use that function 👍 And then it would probably be smart to use Serde to serialize and deserialize these entries.

cc: @NathanFrasier since this references #201

marker_adapter/src/loader.rs Outdated Show resolved Hide resolved
Comment on lines +125 to +128
// The name is not a valid UTF-8 String. This is not ideal but at the
// same time not problematic enough to throw an error. It just means
// that the lint crate can't be tracked and the driver might reuse
// a cache if only that crate was changed.
Copy link
Contributor

@Veetaha Veetaha Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend not to caring about supporting non-utf8 paths. There are a bunch of reasons to do so. First of all, it simplifies the code, and second a lot of tools assume they work with UTF8 paths, for example cargo-metadata crate does so. You may see that it uses Utf8Path[Buf] for storing all the paths parsed from cargo metadata.

My first impression when I saw this in cargo-metadata was "why do you limit the users to UTF8", but then I read the reasoning from camino crate that implements the Utf8Path[Buf], and was convinced that this is the right thing to do. I never in my life had to work with non-UTF8 paths. If one ever happens to work with something like that they are already fucked, and cargo-marker won't be the only tool that doesn't work for them. Can you imagine someone invoking cargo-marker on a project that uses invalid UTF8 in its code structure? I can't, and thus I propose adopting camino and releaving this mental complexity from the codebase. My question is wether you would accept a PR with such kind of change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second the notion here, if cargo metadata already assumes utf-8 paths, we should lean on that for simplicity. I think this is the right design decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with moving to UTF-8 based paths. If we document it, it should be a fine restriction to make. Though I would not do this in this PR

// can just throw it in the `files` set.
for lint_crate in lint_crates {
if let Some(name) = lint_crate.path.to_str() {
files.insert(Symbol::intern(name));
Copy link
Contributor

@Veetaha Veetaha Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it tracks the directory recursively. IIRC there was some bug feature in cargo that if your use rerun-if-changed on a directory in a build script, then only the dir metadata or just the first level of entries will be tracked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've recently talked to someone from a project that always uses the rustc/clippy driver directly. For Marker they would do the same. In such a case, a lint crate might be specified that is not inside the lints directory of the project.

Doing it this way doesn't put any restrictions on the user for little to no cost to us :)

marker_rustc_driver/src/main.rs Outdated Show resolved Hide resolved
marker_rustc_driver/src/main.rs Outdated Show resolved Hide resolved
// make "marker-driver --rustc" work like a subcommand that passes further args to "rustc"
// for example `marker-driver --rustc --version` will print the rustc version that marker-driver
// uses
// make "marker_rustc_driver --rustc" work like a subcommand that passes
Copy link
Contributor

@Veetaha Veetaha Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that previously dashes were used to separate package names. I don't understand why the choice was made to use underscores. To my opinion, cargo package/crates and binary names should have kebab case. This is a common convention. For example, cargo plugins are enforced to use kebab-case, tools like rust-gdb, rust-gdbui, rust-lldb, typos-cli. It's also confusing that one has to install cargo_marker, but then it is insalled as cargo-marker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I can't tell you how many times I've typed the kebab case version when I need the underscore version, this seems relatively simple to change if we want to. @xFrednet might have a good reason for changing it though.

Copy link
Member Author

@xFrednet xFrednet Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 Yeah, so, the background of this is that clippy uses underscores in their crates (clippy_lints & clippy_utils) for library crates. For library crates, this also makes sense IMO, since the use statement also uses underscores. I also looked at some other crates and there seemed to be no real consensus, which should be used bevy for example also always uses underscores.

Now for the binaries... when I decided on the name, I just reserved the driver and cargo_marker name without contemplating that these are binaries. Having underscores everywhere just seemed consistent. That was until I found out that Cargo expects the binary to use a dash instead of an underscore.

So, this basically comes down to, little experience when it comes to actually releasing crates/binaries and only learning about my mistakes after it was too late. I even emailed crates.io about maybe changing the name, but they said that this is not possible.

We could rename the binary in the same way that the cargo_marker crate is renamed, I'm fine with that. But the crate names on crates.io are sadly stuck now. (The only other option, I see, would be to rename the project, which I don't want to)

Copy link
Contributor

@Veetaha Veetaha Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's unfortunate, but I don't see a better solution than renaming the project then 🤕

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one solution for the binaries: Make Marker successful enough that it's actually distributed with rustup instead ^^

marker_rustc_driver/src/lint_pass.rs Outdated Show resolved Hide resolved
@xFrednet
Copy link
Member Author

xFrednet commented Aug 5, 2023

I didn't notice the "spaghetti" that you was referring to

I feel like the control flow of the main function/module and the data flow in the function is a bit convoluted. There are several conditions, that can cause the flow to take another route or just terminate early. I'm just to very linear and deterministic code, so this might feel more normal to you :)

@xFrednet
Copy link
Member Author

xFrednet commented Aug 5, 2023

Thank you for the review and comments! It's really a big help and I appreciate it!

@xFrednet
Copy link
Member Author

xFrednet commented Aug 6, 2023

I think it's safe to r+ this

bors r=Veetaha

@bors
Copy link
Contributor

bors bot commented Aug 6, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 4105783 into rust-marker:master Aug 6, 2023
@xFrednet xFrednet deleted the 190-cancel-cargo-cache branch August 18, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-driver Area: Driver or something related to the internal working of a driver. A-marker-adapter Area: Adapter D-rustc-driver Driver: Rustc Driver S-waiting-on-review Status: Awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid Cargo's caching if any lint crates changed Avoid cargo caching, if relevant files have been changed.
3 participants