Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experiment] patch with patch files #13779

Closed
wants to merge 12 commits into from
Closed

Conversation

weihanglo
Copy link
Member

@weihanglo weihanglo commented Apr 18, 2024

What does this PR try to resolve?

An unstable -Zpatch-files flag is added to support patching your dependencies with patch files, with the patchtool you configure in [patchtool] config table.

This is one experiment based on rust-lang/rfcs#3177 that the team maybe be interested in.
See also #4648.

How should we test and review this PR?

Apparently commit by commit. You could start from test to understand the current user experience of it.

Some design decisions in this implementation:

  • Every patch is settled and applied before the dependency resolution. This implies
    • No re-resolve.
    • Cargo need to fetch a certain version of package beforehand.
      This is currently done by requiring an exact version requirement =,
      so Cargo knows what to download.
    • Due to the nature of patches applied before dependency resolution,
      we could patch whatever we want including dependencies and package.version.
    • The Summary registered to PacakgeRegistry is the source after patching.
  • { patches = [<path>] } field is only available for dependencies under [patch] table,
    and only allows patching registry dependencies for now.
  • A new source PatchedSource that fetches and applies patches to the original source code during block_until_ready.
    • Introduce a new protocol patched+, which compose the SourceId to patched.
    • Tracked what was patched in URL query string, for example patching [email protected] is like:
      patched+registry+https://github.com/rust-lang/crates.io-index?name=serde&version=1.0.0&patch=/path/to/my-bugfix.patch
      
      This SourceId representation also appears in Cargo.lock and Package ID Spec.
  • [patchtool] table in Config to control.
    I imagine it would be something like git mergetool for registering tools,
    though at this moment it only accepts one via path.
  • Cache of patched source resides in $CARGO_HOME/patched-src/<hash-of-original-source>/<pkg>-<version>/<cksum-of-patches>.
    Patched source rebuilds whenever content of patch files or the underlying source change.

You can find some more documentation under src/cargo/sources/patched.rs.

Additional information

I don't think patching before dependency resolution is ergonomic.
Your patched source code would be stuck in time and you need to eagerly update and rebuild them.
Some patches may be safely applied across major versions.
However with the current design you need to specify the same patch twice.

We may like to revisit patching during the resolution,
and reconsider the work of re-resolving dependencies.
That is a more user-friendly way for patching with patch files.

@rustbot
Copy link
Collaborator

rustbot commented Apr 18, 2024

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-configuration Area: cargo config files and env vars A-manifest Area: Cargo.toml issues A-testing-cargo-itself Area: cargo's tests A-unstable Area: nightly unstable support A-workspaces Area: workspaces S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 18, 2024
@weihanglo weihanglo force-pushed the unidiff branch 2 times, most recently from b8bb3e5 to 74c61c6 Compare April 19, 2024 01:08
/// A file indicates that if present, the patched source is ready to use.
const READY_LOCK: &str = ".cargo-ok";

/// `PatchedSource` is a source that, when fetching, it patches a paticular
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this doc comment for some design decisions.

@epage
Copy link
Contributor

epage commented Apr 19, 2024

@weihanglo how would you like this reviewer?

As this is unstable and we previously agreed to the idea of experiments, in theory the bar is fairly low.

I will admit some hesitance on this experiment. In adding a new source kind, I'm assuming this can be a bit invasive (to be fair, I've not yet looked at the code). I'm also a bit surprised at the design direction described in the PR description.

So how much design discussion should we work out vs get in, try it out, and then iterate on alternative design ideas as we see needed?

@weihanglo weihanglo force-pushed the unidiff branch 2 times, most recently from a46ced8 to ef73e26 Compare April 19, 2024 02:01
@weihanglo
Copy link
Member Author

@epage, This is not intended to merge as-is. It's more like a proof of concept that this approach does works but with a not okay user experience.
To be honest, I lean toward a resolve-then-patch solution but this one is not. I am tempted to write another one that might introduce re-resolve.

About reviewing the pull request, I would recommend starting from tests to get an idea of how it looks like. I would suggest not getting into details until we feel better with the UI.

We could probably open a Zulip topic on t-cargo for design discussions.

@bors
Copy link
Contributor

bors commented Apr 19, 2024

☔ The latest upstream changes (presumably #13778) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Apr 22, 2024

☔ The latest upstream changes (presumably #13783) made this pull request unmergeable. Please resolve the merge conflicts.

@Eh2406
Copy link
Contributor

Eh2406 commented Apr 26, 2024

Having look this over it seems pretty reasonable. If you would prefer to experiment with a "resolution loop" I'd be happy to compare cost-benefits, but I suspect this is simpler.

Cargo need to fetch a certain version of package beforehand. This is currently done by requiring an exact version requirement =, so Cargo knows what to download.

I don't understand why the equals requirement is needed. Say the patch was for >2, why couldn't we ask of the underlying source for a list of versions and then patch each one as they turn out to be needed? Well, I guess will we might then have to patch them all to see if the version numbers change as a result of the patching process. But that could be avoided by banning changing version numbers.

I'm not remembering the API at the moment, it may be that we need to return them all up front instead of "as they turn out to be needed"? If that's the problem, we could think about making a lazier API. (PubGrub would be happy with the lazy API, I'm not sure how much churn would be involved in moving the current resolver.)

I don't think patching before dependency resolution is ergonomic.
Your patched source code would be stuck in time and you need to eagerly update and rebuild them.

I don't think I follow, what ends up stuck in time?

Some patches may be safely applied across major versions.
However with the current design you need to specify the same patch twice.

This seems to be an important point. Especially as it's across "different versions" with the current = limitation.

@epage
Copy link
Contributor

epage commented Apr 26, 2024

Oh, I should capture some office hour thoughts

In my mind, no matter the type of patch, it should all act the same. It sounds like this patches before resolution, rather than during, and allows your patches to affect resolution unlike regular patches (iiuc). If this is the case for both kinds of patches, then that feels like a barrier to move forward.

In my mind, I see this PR introducing two things:

  • Registry [patch] entries
  • patches key for any kind of [patch] entry

I think we should decouple these conversations so we can make sure we are getting the right semantics for each or, at minimum, defer non-patches registry [patch] entries without backing ourselves into a corner on its design.

A thought of mine that didn't come up during office hours is me worrying that pluggable patch tools might be generalizing too much. I'd want to see a better explanation of why that extra complexity is needed and what the alternatives are.

@Eh2406
Copy link
Contributor

Eh2406 commented Apr 26, 2024

unlike regular patches (iiuc).

Regular patches can also affect resolution. It is replace that can not.

@weihanglo
Copy link
Member Author

weihanglo commented Apr 26, 2024

Reply to #13779 (comment):

I don't understand why the equals requirement is needed.
...it may be that we need to return them all up front instead of "as they turn out to be needed"? If that's the problem, we could think about making a lazier API.

= requirement is needed so that we know what to patch before resolution. Otherwise yeah we need to patch every version in that version range.

We could have lazy mechanism, but I would like to avoid patching “during resolution” at this moment. Patching before or after resolution is less complicated I believe.

I don't think patching before dependency resolution is ergonomic.
Your patched source code would be stuck in time and you need to eagerly update and rebuild them.

I don't think I follow, what ends up stuck in time?

The exact requirement issue.


Reply to #13779 (comment):

In my mind, I see this PR introducing two things:

  • Registry [patch] entries
  • patches key for any kind of [patch] entry

Actually this PR intentionally touches nothing in ops::resolve or core::resolver. The core idea is having a new source and prepare it for resolution. You can consider it as a new kind of VCS source. Hence, it is not really that controversial.

The one major thing to explore is the semantic of the new patched source kind when being in [patch].


Just talked to Jacob earlier. There are at least three timing for patching:

  • Registry [patch] before resolution
    • This is how today's [patch] work, as well as this PR.
    • Need to know every summary upfront.
  • Register [patch] on demand during resolution
    • This might be hard. [patch] can be added or removed depending on application of other [patch], so more complicated backtraces might be needed. (I didn't look into this route deep though)
  • Register [patch] after resolution, and re-resolve if needed
    • Re-resolve happens if any summary has changed due to patch, and is currently in use in resolve graph
    • If we go this route, we could first ban any change to package.{name,version}. By doing so [patch] should have the same semantic with the current patch kinds (name@version must match).
    • This is the next thing I would like to experiment on.

(By registering [patch] I mean making [patch] entry available for dependency resolver to select/query.)

@bors
Copy link
Contributor

bors commented Apr 30, 2024

☔ The latest upstream changes (presumably #13813) made this pull request unmergeable. Please resolve the merge conflicts.

@NeuralModder
Copy link

Thank you for starting work on this, I eagerly await this feature.


{ patches = [<path>] } field is only available for dependencies under [patch] table,
and only allows patching registry dependencies for now.

This isn't too hard to change, right? Specifying a git = <url> field in the entry under the [patch] table and patching that would be pretty useful, as it would make applying patches from git repositories much easier, or possible in the first place in some cases.


[patchtool] table in Config to control.
I imagine it would be something like git mergetool for registering tools,
though at this moment it only accepts one via path.

I mean, the only formats people really use are the unified format and git's additions to that. Maybe it'd be better to just pull in a rust parser for patches? There's the patch crate, though that has issues with git patches, so maybe fork it into gitpatch which adds proper support for all of that and autodetects if it needs to strip a path prefix or not.


Reply to #13779 (comment):
* Register [patch] after resolution, and re-resolve if needed
* Re-resolve happens if any summary has changed due to patch, and is currently in use in resolve graph
* If we go this route, we could first ban any change to package.{name,version}. By doing so [patch] should have the same semantic with the current patch kinds (name@version must match).
* This is the next thing I would like to experiment on.

Currently, the [patch] section can also fix a dependency resolution that fails without it. Consider, for example, this issue. The difference between the listed git branch and the available crates.io version for wgpu is massive (for some reason only 0.18.0 is available there), but the change that makes the dependency resolution work is a single added character in wgpu/Cargo.toml. Registering the patch after resolution instead of before resolution would mean that you couldn't use a patch file to do this.


Alex Crichton raised a concern in the RFC thread that I'd like to bring up:

Reply to comment:
Another question about how to actually generated patch files is where is the source coming from to actually generate a diff against? There seems to be two primary sources to patch, one being git and one being crates.io. For crates.io crates there's not really an official method to actually check out and view the source of any particular version. [...]

Reply to comment:
[...] There is no supported way really to actually get the source of a crates.io crate. You can't use git repositories since Cargo or cargo package may move files around or drop some files. [...]

Another person gave an answer to the issue of getting the source for crates.io crates, what I want to mention is that it's probably acceptable to tell people to patch git dependencies if they modify Cargo.toml or if the package is in a workspace, and they want to patch with a git diff, or especially git format-patch. You could conceivably move Cargo.toml.orig to Cargo.toml for crates not in a workspace to make patches to that file easier, but for crates in a workspace that would break. I think it's probably better to just present the registry crate source as-is and let the user patch it in that state, pointing to methods for grabbing the registry crate source in the docs.

@weihanglo
Copy link
Member Author

weihanglo commented Jul 1, 2024

Thanks for being enthusiastic about this!


{ patches = [<path>] } field is only available for dependencies under [patch] table,
and only allows patching registry dependencies for now.

This isn't too hard to change, right? Specifying a git = <url> field in the entry under the [patch] table and patching that would be pretty useful, as it would make applying patches from git repositories much easier, or possible in the first place in some cases.

Personally, I will hold off this. If one is able to use git dependencies, it is assumed that they are also able to fork Git repoistories. To me, supporting Git dependencies is not the main purpose.

[patchtool] table in Config to control.
I imagine it would be something like git mergetool for registering tools,
though at this moment it only accepts one via path.

I mean, the only formats people really use are the unified format and git's additions to that. Maybe it'd be better to just pull in a rust parser for patches? There's the patch crate, though that has issues with git patches, so maybe fork it into gitpatch which adds proper support for all of that and autodetects if it needs to strip a path prefix or not.

The ideas behind this are:

  • By providing [patchtool] config, users are free from locked in in a VCS. Some projects and companies use their own VCS. Some version control systemes have completely different diff algorithms (e.g. pijul).
  • It is unclear the level of compatibility these crates provide with unified diff format (or what generated from GNU patch).
  • I don't have time investigating each of these options, and I don't have spare time maintaining an additional crate for patching. Actually, Cargo's built-in support of Git and other VCS has some maintenance costs and to me I would like to avoid it

Currently, the [patch] section can also fix a dependency resolution that fails without it. Consider, for example, this issue. The difference between the listed git branch and the available crates.io version for wgpu is massive (for some reason only 0.18.0 is available there), but the change that makes the dependency resolution work is a single added character in wgpu/Cargo.toml. Registering the patch after resolution instead of before resolution would mean that you couldn't use a patch file to do this.

Have you tried #14055? I haven't read cargo-patch source code so don't really know what's going on there. The main difference of #13779 comparing to #14055 is usability, which in reality you seldom "pin" a version. By not pinning a version, you can also apply patch to a wider range versions without enumerating all applicable versions. This pattern can also make a project or a company easier to maintain own patch repositories. They only need to specific version range, not each version.

Alex Crichton raised a concern in the RFC thread that I'd like to bring up:
...
Another person gave an answer to the issue of getting the source for crates.io crates, what I want to mention is that it's probably acceptable to tell people to patch git dependencies if they modify Cargo.toml or if the package is in a workspace, and they want to patch with a git diff, or especially git format-patch. You could conceivably move Cargo.toml.orig to Cargo.toml for crates not in a workspace to make patches to that file easier, but for crates in a workspace that would break. I think it's probably better to just present the registry crate source as-is and let the user patch it in that state, pointing to methods for grabbing the registry crate source in the docs.

Yeah you're right. Agree with you. Workspace crates may be broken. A published crate has run through an amount of normalization. It is not only Cargo.toml that was edited, but also some other files got copied cover. And yes we should provide tools helping that. I pretty much lean toward a 3rd party plugin, which is more flexible and isn't tied to Rust strict compatibility guarantee.

@FancyQian
Copy link

I really appreciate if we can have this issue, from the big project perspective, we even need to workaround some 3rd-party crate..

@bors
Copy link
Contributor

bors commented Jul 3, 2024

☔ The latest upstream changes (presumably #14026) made this pull request unmergeable. Please resolve the merge conflicts.

@NeuralModder
Copy link

NeuralModder commented Jul 5, 2024

Personally, I will hold off this. If one is able to use git dependencies, it is assumed that they are also able to fork Git repoistories. To me, supporting Git dependencies is not the main purpose.

Yes, but personally I'd much prefer having a git format-patch-style patchset for git dependencies in my repository instead of maintaining a fork out of tree.

In a project I'm a part of, we currently have around 8 or so forked git repositories belonging to community members. Some of these forks might only have 1, maybe a handful of commits on top of the upstream repository. In the process of updating some of our fundamental dependencies, I've found myself needing to make some additions to some of these forks. This means forking the fork again, pushing my changes, making a PR on the original fork and pinging the owner to merge. We could probably handle this better, but I think it'd be nicer to be able to maintain a patchset on a git repository than to fiddle with forks (though my fellow project members may disagree).

And if it might be useful, why not?


The ideas behind this are:
* By providing [patchtool] config, users are free from locked in in a VCS. Some projects and companies use their own VCS. Some version control systemes have completely different diff algorithms (e.g. pijul).
* It is unclear the level of compatibility these crates provide with unified diff format (or what generated from GNU patch).
* I don't have time investigating each of these options, and I don't have spare time maintaining an additional crate for patching. Actually, Cargo's built-in support of Git and other VCS has some maintenance costs and to me I would like to avoid it

That's true. I'm interested in pijul myself, though I haven't looked at it close enough to see how the patches are actually stored, or if you can convert them to the unified format.

I wouldn't mind getting my hands dirty with a crate for patching, but I understand if having some random person write/maintain the crate (and the risk of it being abandoned) doesn't meet project standards.

I'd probably agree with trying to avoid maintenance costs associated with VCS support.


Have you tried #14055? I haven't read cargo-patch source code so don't really know what's going on there. The main difference of #13779 comparing to #14055 is usability, which in reality you seldom "pin" a version. By not pinning a version, you can also apply patch to a wider range versions without enumerating all applicable versions. This pattern can also make a project or a company easier to maintain own patch repositories. They only need to specific version range, not each version.

Thanks for the link, didn't know that existed. I've recreated the issue with the cargo feature on that branch, and will explain it to hopefully make it more clear.

On a new project, I have a Cargo.toml with the following contents:

cargo-features = ["patch-files"]

[package]
name = "test-patch"
version = "0.1.0"
edition = "2021"

[dependencies]
wgpu = "0.18"
egui_wgpu_backend = "0.28"

[patch.crates-io]
wgpu = { version = "0.18", patches = [
    "./0001-fix-wgpu.diff",
] }

The contents of 0001-fix-wgpu.diff are as follows:

diff --git a/wgpu/Cargo.toml b/wgpu/Cargo.toml
index 343ed2b05..a82c99f58 100644
--- a/wgpu/Cargo.toml
+++ b/wgpu/Cargo.toml
@@ -159,7 +159,7 @@ web-sys = { workspace = true, features = [
     "GpuCompilationMessageType",
     "GpuComputePassDescriptor",
     "GpuComputePassEncoder",
-    "GpuComputePassTimestampWrite",
+    "GpuComputePassTimestampWrites",
     "GpuComputePipeline",
     "GpuComputePipelineDescriptor",
     "GpuCullMode",

If I apply the diff to a local clone of the wgpu repository checked out at tag v0.18.0 and change the [patch.crates-io] section to use that repository in the path variable (and comment out the first line of the Cargo.toml), it builds (with the vanilla 1.78.0 toolchain).

But simply running it with the file contents as above using your cargo branch from #14055 spits out this error:

error: failed to select a version for `web-sys`.
    ... required by package `wgpu v0.18.0`
    ... which satisfies dependency `wgpu = "^0.18"` of package `test-patch v0.1.0 (/home/youser/src/test-patch)`
versions that meet the requirements `^0.3.64` are: 0.3.69, 0.3.68, 0.3.67, 0.3.66, 0.3.65, 0.3.64

the package `wgpu` depends on `web-sys`, with features: `GpuComputePassTimestampWrite` but `web-sys` does not have these features.


all possible versions conflict with previously selected packages.

  previously selected package `web-sys v0.3.67`
    ... which satisfies dependency `web-sys = "^0.3.64"` of package `wgpu v0.18.0`
    ... which satisfies dependency `wgpu = "^0.18"` of package `test-patch v0.1.0 (/home/youser/src/test-patch)`

failed to select a version for `web-sys` which could resolve this conflict

And cargo then terminates. This happens because, while wgpu 0.18.0 depends on web-sys ^0.3.64, egui_wgpu_backend 0.28 depends on wgpu 0.19, which depends on web-sys ^0.3.67, and the listed cargo feature got renamed from GpuComputePassTimestampWrite to GpuComputePassTimestampWrites in web-sys 0.3.65.

Mind you, this example is kind of contrived, as you wouldn't really want to pull in two different versions of wgpu at once. But the problem itself can still happen in the real world; in the project I'm a part of, we currently patch wgpu to include some changes from newer versions, and if the patch section is commented out cargo produces the same error.

Also, this patch wouldn't apply if it were applied to the actual registry source, but you could pretty easily make one that does if you went and grabbed that. The error wouldn't change.

This isn't necessarily to say that your approach is untenable, though. You could just tighten restrictions for what can be done with Cargo.toml in patches, for example. But this needs to be taken into account. I think it could generate quite a bit of confusion if someone tries migrating their previous git repository forks in the patch section to this feature, only to discover that things don't appear to be working in the same way.

(I'm also wondering now if this is a bug in dependency resolution that could actually just be fixed...)

Edit: I also forgot to add a [patchtool] section in .cargo/config.toml, but that also doesn't make a difference in this case.

@bors
Copy link
Contributor

bors commented Jul 26, 2024

☔ The latest upstream changes (presumably #13947) made this pull request unmergeable. Please resolve the merge conflicts.

@weihanglo
Copy link
Member Author

@NeuralModder thanks for your sharing.

(I'm also wondering now if this is a bug in dependency resolution that could actually just be fixed...)

I think that is possibly a bug in my implementation. #14055 has a better user experience in my opinion, though I don't think both of the pull requests are in a good state in terms of code complexity and usability.

Also, myself currently have no time in pursuing this feature, so going to close this. Don't worry the PR diff is always here so if somebody or myself want to pick it up later, we can always do that. Thank you all.

@weihanglo weihanglo closed this Aug 9, 2024
@weihanglo weihanglo deleted the unidiff branch August 9, 2024 14:54
@epage
Copy link
Contributor

epage commented Aug 27, 2024

Something we discussed in the Cargo team meeting was whether we could expedite this by preventing patches that touched Cargo.toml files. While a lot of people want to patch dependencies, this could help in other cases like patching time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-configuration Area: cargo config files and env vars A-manifest Area: Cargo.toml issues A-testing-cargo-itself Area: cargo's tests A-unstable Area: nightly unstable support A-workspaces Area: workspaces S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants