Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Document the rationale for the lack of Cargo.lock #14071

Merged
merged 2 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,27 @@ stable API, we also improve the API over time. As a result, we typically
deprecate methods before removing them, according to the [deprecation guidelines].

[deprecation guidelines]: https://datafusion.apache.org/library-user-guide/api-health.html

## Dependencies and a `Cargo.lock`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrobbel has a good point here #14069 (comment)

I think one concern originally was how we would keep a Cargo.lock file up to date with the latest version of the dependencies

I can't remember if depndabot was available at that point -- maybe dependabot is good enough that we could check in Cargo.lock and have Dependabot update it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective it is very important that CI covers testing with the latest versions of all dependencies (as that is what many/most downstream crates will use as well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to point to https://blog.rust-lang.org/2023/08/29/committing-lockfiles.html for considerations and suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR it sounds like the rust team now suggests always committing Cargo.lock and letting dependabot handle updates. That seems like a good idea to me

Copy link

@gatesn gatesn Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my two cents, but I have found Renovate to be much more configurable. Here's an example of a lock file maintenance PR: spiraldb/vortex#1818

Copy link
Contributor Author

@alamb alamb Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we have to be aware of in DataFusion is that as part of the Apache security posture, only certain third party actions are allowed -- we would have to double check Rennovate

I think the next step is probably to file an issue to explicitly discuss checking in a Cargo.lock file. I'll try and find time over the next few days if no one beats me to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed a ticket to discuss next steps:


`datafusion` is intended for use as a library and thus purposely does not have a
`Cargo.lock` file checked in. You can read more about the distinction in the
[Cargo book].

CI tests always run against the latest compatible versions of all dependencies
(the equivalent of doing `cargo update`), as suggested in the [Cargo CI guide]
and we rely on Dependabot for other upgrades. This strategy has two problems
that occasionally arise:

1. CI failures when downstream libraries upgrade in some non compatible way
2. Local development builds that fail when DataFusion inadvertently relies on
a feature in a newer version of a dependency than declared in `Cargo.toml`
(e.g. a new method is added to a trait that we use).

However, we think the current strategy is the best tradeoff between maintenance
overhead and user experience and ensures DataFusion always works with the latest
compatible versions of all dependencies. If you encounter either of these
problems, please open an issue or PR.

[cargo book]: https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
[cargo ci guide]: https://doc.rust-lang.org/cargo/guide/continuous-integration.html#verifying-latest-dependencies
4 changes: 3 additions & 1 deletion datafusion-cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ The reason `datafusion-cli` is not part of the main workspace in
checked in `Cargo.lock` file to ensure reproducible builds.

However, the `datafusion` and sub crates are intended for use as libraries and
thus do not have a `Cargo.lock` file checked in.
thus do not have a `Cargo.lock` file checked in, as described in the [main
README] file.

[`datafusion cargo.toml`]: https://github.com/apache/datafusion/blob/main/Cargo.toml
[main readme]: ../README.md
Loading