Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git-lfs support #10153

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open

git-lfs support #10153

wants to merge 47 commits into from

Conversation

b-camacho
Copy link

@b-camacho b-camacho commented Mar 4, 2024

Motivation

nix fetches git repos using libgit2, which does not run filters by default. This means LFS-enabled repos can be fetched, but LFS pointer files are not smudged.

This change adds a lfs attribute to fetcher URLs. With lfs=1, when fetching LFS-enabled repos, nix will smudge all the files.

Context

See #10079.
Git Large File Storage lets you track large files directly in git, using git filters. A clean filter runs on your LFS-enrolled files before push, replacing large files with small "pointer files". Upon checkout, a "smudge" filter replaces pointer files with full file contents. When this works correctly, it is not visible to users, which is nice.

Changes

  • builtins.fetchGit has new bool lfs attr
  • when lfs=true, GitSourceAccessor will smudge any pointer files with the lfs filter attribute
  • as verified by new test in tests/nixos/fetchgit (this is why lfs is now enabled on the test gitea instance)

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Mar 4, 2024
@b-camacho
Copy link
Author

Small complication:
it seems that nix flake lock calls fetch which in turn calls Input::fetch -> InputScheme::fetch -> fetchToStore.

In other words, I don't currently see a way to:

  • materialize LFS files when fetching -source store paths, but
  • don't materialize LFS files during nix flake lock.

bew added a commit to bew/nixed-os-configs that referenced this pull request Mar 8, 2024
Using git-lfs, when the flake copies the repo to the store (for purity)
the 'virtual file' stored in git is copied (with oid/size info of the
object in LFS) instead of the actual (large) file :/
ref: NixOS/nix#10153

I think it was working before because the file was in git temporarily at
some point, then I moved it to LFS, but after the system was built..
(or something like that 🤷)
bew added a commit to bew/nixed-os-configs that referenced this pull request Mar 8, 2024
Using git-lfs, when the flake copies the repo to the store (for purity)
the 'virtual file' stored in git is copied (with oid/size info of the
object in LFS) instead of the actual (large) file :/
ref: NixOS/nix#10153

I think it was working before because the file was in git temporarily at
some point, then I moved it to LFS, but after the system was built..
(or something like that 🤷)
bew added a commit to bew/nixed-os-configs that referenced this pull request Mar 8, 2024
Using git-lfs, when the flake copies the repo to the store (for purity)
the 'virtual file' stored in git is copied (with oid/size info of the
object in LFS) instead of the actual (large) file :/
ref: NixOS/nix#10153

I think it was working before because the file was in git temporarily at
some point, then I moved it to LFS, but after the system was built..
(or something like that 🤷)
@L-as
Copy link
Member

L-as commented Mar 9, 2024

What use case do you have in mind? Isn't LFS typically for large files, that wouldn't usually affect evaluation anyway?

@b-camacho
Copy link
Author

What use case do you have in mind? Isn't LFS typically for large files, that wouldn't usually affect evaluation anyway?

builtins.fetchGit populates the nix store with a <hash>-store path. This path is used as the source when building a derivation. Currently, the builder will see the unsmudged LFS pointer files, but I'd like the builder to optionally see smudged files. I agree that smudged files are usually not needed at eval time, but I don't see a good alternative of making them available at build time, besides a fixed-output derivation (but fixed-output derivations have their own problems)

@L-as
Copy link
Member

L-as commented Mar 14, 2024

A FOD seems optimal here, in general you shouldn't use builtins.fetchGit if you're only going to use it at build time.

@b-camacho
Copy link
Author

In general I agree, but (afaik) other fetchers can't use git credentials.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-03-11-nix-team-meeting-132/42960/1

@khoitd1997
Copy link

@roberth have you had a chance to take a look at this issue? We have been staying at older versions of Nix as a workaround but newer versions now have fixes for critical issues so sticking to old ones would no longer be optimal.

@roberth
Copy link
Member

roberth commented Jun 22, 2024

Hi @b-camacho, thanks for the ping and sorry for the delay. This PR was assigned to me, but I hadn't prioritized it because it was a draft. Wrong assumption on my end, because I do think this is valuable, and I have some things to say :)

when fetching LFS-enabled repos, nix will smudge all the files.

That's a good start, but we need to make sure that the smudging happens in a controlled manner; otherwise we risk adding impurities.

Specifically, we should parse the attribute to check that they're supposed to be unsmudged by lfs; if not, ignore the smudge rule. It seems you were already investigating how this could be implemented.

Furthermore, we should validate the sha256 so that we don't increase the potential for silent errors by a whole external program. The hash should be easy to parse from the pointer file, and while reading other programs' inputs is a little ad hoc, I don't expect any serious issues from this, as we won't cause users to accidentally rely on a bug this way.


the LFS-tracked files are materialized during nix flake lock - this is bad

This won't happen unnecessarily either of these are implemented

If we need to backtrack on the removal of narHashes (#6530), we can also avoid re-locking transitive inputs whose lock has already been computed by the dependency's lock.

So yes, this isn't efficient yet, but it will be.

A FOD seems optimal here, in general you shouldn't use builtins.fetchGit if you're only going to use it at build time.

A fixed output derivation works best when all you're using it for is as an input to another derivation (and it's publicly available, as mentioned).
However, if the "fetched" source is a flake (e.g. you have a flake.nix in a repo with LFS files), then you also need to evaluate files from the fetched source, which would constitute import from derivation, which is not optimal. Furthermore you'd need to produce fixed-output hashes for your local repo files, which is such horrible UX we don't need to consider it as a solution.


To summarize, this is worth implementing, I see no blocking issues, design or otherwise, and the following needs to be done:

  • lfs attribute with default false, LGTM
  • figure out which files are LFS
  • invoke the Git LFS filter from $PATH; no need for a rigid dependency or makeWrapper
  • check the sha256
  • add a test, perhaps extending tests/nixos/fetch-git
  • documentation for the lfs attribute (currently under fetchGit's entry in doc/manual/src/language/builtins.md); mention the runtime dependency on the LFS package.

@kip93
Copy link

kip93 commented Jul 23, 2024

What's the state on this PR? Seems to unfortunately be a bit stale given the delayed review. This issue has been plaguing us for a while, so I'm willing to pick up the torch here and try to get this out the door (was actually starting to see how to fix this myself back in March when I saw this PR and decided to see what came out of this).

@roberth
Copy link
Member

roberth commented Jul 23, 2024

@kip93 I think your question was directed towards @b-camacho, but I'd like to add that we would welcome and support anyone who'd like to work on this.

Feel free to ask questions here or in the meetings if you can make them. We generally have some agenda, but we also like to make time for contributors during or after, when we often hang out while we get some things done. Link to the video conference is in the scratchpad linked there. We also have a matrix room, although personally I'm guilty of neglecting that one sometimes.

@b-camacho
Copy link
Author

Thanks for the thorough writeup @roberth !
I owe you all an update. To avoid shelling out and implement some features we need to merge, I need a subset of a git-lft C/C++ client. I reimplemented one from Python into C++ here https://github.com/b-camacho/git-lfs-fetch-cpp.

Once I add some tests and integrate git-lfs-fetch-cpp here, we should be ready for another review!

I'm still on vacation with not-great internet, but back in 6 days and will update you all on 7/31 regardless.

Thanks for the feedback and sorry for the wait!

@roberth
Copy link
Member

roberth commented Jul 24, 2024

Oh, I don't think shelling out was such a big deal because we can verify the correctness of the result, kind of like how fixed output derivations are allowed to do "grossly impure" things because we can verify the output.

I guess a library implementation of it is still nice for a consistent UX with a small closure size though.

@L-as

This comment was marked as off-topic.

@roberth

This comment was marked as off-topic.

@L-as

This comment was marked as off-topic.

@kip93
Copy link

kip93 commented Oct 10, 2024

Hey! It's me again! I just want to ask if there's anything I can help with here. Maybe I can try and doing some testing, or do a smaller version of this that uses the git-lfs CLI tools while the full implementation gets done?

We have a lot of repos with LFS files that would greatly benefit from this, so I'm willing to do whatever work is needed, but also don't want to add extra work for others where it's not wanted.

@b-camacho
Copy link
Author

I was right about the size callback, the nixos test passes again!
Remerged master, would love to get this in before we fall too far behind again.
Last thing I can think of adding is fixes for fetching over ssh, but I agree with you it may be better to handle those in a followup.

@kip93
Copy link

kip93 commented Dec 2, 2024

Nice! I would do some testing on wednesday and if http works fine then I'll call this PR mergeable from my side (still needs a review from someone else of course). I do want to get the ssh part working but won't block this PR just for that.

@kip93
Copy link

kip93 commented Dec 4, 2024

I did some quick tests, validated that I can pull via HTTPS from both github and gitlab, and since our tests also run on gitea, that's 3 different services working as expected. SSH is not working, but now that the rest looks like it works, I'll give up the tiny hope that this would get resolved in this PR.

@b-camacho
Copy link
Author

Hm, just re-checked I can pull from GitHub and GitLab over SSH from my machine, this gist has the commands and outputs I ran. Current implementation intentionally uses the system ssh binary.

Do you have any more details on the ssh failures? You probably took this into account, but asking for completeness: even publicly available repos do not allow ssh access without auth, so you have to try this on a repo where you have owner/collaborator access on a privkey in your keychain.

Alas, while I don't exactly enjoy chasing subtle git{ea,hub,lab} impl differences, I'll try to set up an automated test for ssh fetching while we wait for a review.

@kip93
Copy link

kip93 commented Dec 6, 2024

Not sure what happened before (I was a bit burnt out so it might just have been a silly mistake) but now it's working fine, both public and private repos on github and gitlab work ok 🎉

Sorry for the false alarm 🙈

@jsqu4re
Copy link

jsqu4re commented Dec 9, 2024

Thank you so much for your efforts! I tried this PR and it works well for gitlab.com. Is there a chance to also allow for local remote repos using url = "file:///path/to/repo/.git"? Currently there is this error:

ssh: Could not resolve hostname file: Name or service not known
error: git-lfs-authenticate: no output (cmd: ssh file git-lfs-authenticate ///path/to/repo/.git download)

@kip93
Copy link

kip93 commented Dec 9, 2024

I'd say this is a nice to have but probably not worth delaying this PR

A follow up PR would be more appropriate.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed a bit. It seems that the gitattributes support is incomplete, but could perhaps be reused from libgit2.

Git consults [...] the same directory as the path in question, and its parent directories up to the toplevel of the work tree

  • see comments

Comment on lines 679 to 681
if (lfsFetch && !lookup(CanonPath(".gitattributes"))) {
warn("Requested to fetch lfs files, but no .gitattributes file was found, ignoring");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for these to be in subdirectories.
I'd be ok with not checking this, because checking the whole tree may be slow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cloning with LFS is quite slow as it is, adding a few extra seconds at most to recurse the entire tree is probably not very noticeable

src/libfetchers/git-lfs-fetch.hh Outdated Show resolved Hide resolved
std::istringstream iss(content_str);
std::string line;

while (std::getline(iss, line)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't libgit2 handle this for us?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if libgit2 has lfs support, but the syntax was simple enough that we just wrote a quick parser. I'll take another look at libgit2's provided tools next week to see if I missed anything

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git attributes exist at the git layer (as opposed to lfs), you can read them with libgit2's git_attr_get (the ExportIgnore fetcher does this), but I could never get git_attr_get to work, but I'll give it another shot tonight.

src/libfetchers/git-utils.cc Outdated Show resolved Hide resolved

# check that file was smudged
file_size_lfs = client.succeed(f"stat -c %s {fetched_lfs}/beeg").strip()
assert int(file_size_lfs) >= expected_max_size_lfs_pointer, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should know the exact size based on the dd command; would be better to check with ==.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a conceptual test, since a pointer cannot be >= 1024 by definition, can definitely also check for the full 1MiB size. Will rewrite this with the other test changes when I have the time

Comment on lines 14 to 15
# Request lfs fetch without any .gitattributes file
############################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the first test case; a base case that doesn't actually use the lfs feature.
The subtest feature would be nice for this. It doesn't do much, but other NixOS tests use it for this purpose, and I think it's nice to mark structure with indentation as well.

Suggested change
# Request lfs fetch without any .gitattributes file
############################################################################
with subtest("Request lfs fetch without any .gitattributes file"):
< ... indented code ... >

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know there was a subtest feature, will rewrite these as soon as I have time (probably next week :/ )

tests/nixos/fetch-git/test-cases/lfs/default.nix Outdated Show resolved Hide resolved
Comment on lines 180 to 200
# Use as flake input
############################################################################
with TemporaryDirectory() as tempdir:
client.succeed(f"mkdir -p {tempdir}")
client.succeed(f"""
printf '{{
inputs = {{
foo = {{
url = "git+{repo.remote}?ref=main&rev={lfs_file_rev}&lfs=1";
flake = false;
}};
}};
outputs = {{ foo, self }}: {{ inherit (foo) outPath; }};
}}' >{tempdir}/flake.nix
""")
fetched_flake = client.succeed(f"""
nix eval --raw {tempdir}#.outPath
""")

assert fetched_lfs == fetched_flake, \
f"fetching as flake input (store path {fetched_flake}) yielded a different result than using fetchGit (store path {fetched_lfs})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case seems redundant to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added because the flake fetch does some things on top of what fetchGit does, and it was failing at some point. So while it may look redundant, it apparently does have a subtle difference.

}

bool Fetch::hasAttribute(const std::string & path, const std::string & attrName, const std::string & attrValue) const
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a compelling reason not to use git_attr_get_ext?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't implement this part, @b-camacho may have more context, but I could try my hand at using this instead to see if it still works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a satisfying explanation but I could not get it to work without materializing the files on disk, but I'll try again

@b-camacho
Copy link
Author

b-camacho commented Dec 9, 2024

Thank you so much for your efforts! I tried this PR and it works well for gitlab.com. Is there a chance to also allow for local remote repos using url = "file:///path/to/repo/.git"? Currently there is this error:

ssh: Could not resolve hostname file: Name or service not known
error: git-lfs-authenticate: no output (cmd: ssh file git-lfs-authenticate ///path/to/repo/.git download)

I'd be happy to add this, but I agree this PR is already pretty big so we should save it for a followup.

@jsqu4re
Copy link

jsqu4re commented Dec 9, 2024

I'd be happy to add this, but I agree this PR is already pretty big so we should save it for a followup.

Makes total sense, and thanks for tackling this task!

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-12-09-nix-team-meeting-minutes-201/57280/1

@CHN-beta
Copy link

Thank you so much for your efforts! I tried this PR and it works well for gitlab.com. Is there a chance to also allow for local remote repos using url = "file:///path/to/repo/.git"? Currently there is this error:

ssh: Could not resolve hostname file: Name or service not known
error: git-lfs-authenticate: no output (cmd: ssh file git-lfs-authenticate ///path/to/repo/.git download)

I'd be happy to add this, but I agree this PR is already pretty big so we should save it for a followup.

Seems git LFS is only supported over https now. Enabling other transfers (e.g. ssh) needing NixOS/nixpkgs#350130. Please let me know if I am wrong.

@kip93
Copy link

kip93 commented Dec 16, 2024

Today I'll try to rewrite all tests, but wanted to let you know that the guys from libgit2 actually replied to my issue, and in there I've found out that this function we're using (git_pathspec_matches_path) is not the right one (and there's currently no right function for this job) since a pathspec is not the same as a path pattern. So I'll try to use the git_attr_get_ext again and see if that works for our purposes. This should also fix issues with negated patterns that I've noticed today.

@b-camacho
Copy link
Author

Oh I had no idea they were different, that explains a lot!

@kip93
Copy link

kip93 commented Dec 18, 2024

Ok I've reintroduced the git_attr_get_ext and have it working. I'm currently testing that it does not import files into the store too early like it used to be the case the first time around.

Also rewrote the tests to use subtests as suggested by the code review.

@kip93
Copy link

kip93 commented Dec 19, 2024

With my quick-ish tests (with several large lfs repos which take a long time to clone) it all looks ok? I think it's ready for another review.

Given that holidays are around the corner, I won't have any more time to work on this 'till after new years (of course, I'm also not expecting you to skip xmas to review this, I'm fine if the review is delayed as well). If anything needs fixing I can take this back up again probably 6-ish of january.

try {
_lfsFetch.fetch(blob.get(), pathStr, s, [&s](uint64_t size){ s.s.reserve(size); });
} catch (Error &e) {
e.addTrace({}, "while smudging git-lfs file '%s' (std::string interface)", pathStr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
e.addTrace({}, "while smudging git-lfs file '%s' (std::string interface)", pathStr);
e.addTrace({}, "while smudging git-lfs file '%s'", path);

const CanonPath & path,
Sink & sink,
std::function<void(uint64_t)> sizeCallback = [](uint64_t size){}) override {
auto blob = getBlob(path, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need nearly identical readBlob() and readFile() functions? SourceAccessor has a default implementation for the string version of readFile() that wraps the streaming version.

void readFile(
const CanonPath & path,
Sink & sink,
std::function<void(uint64_t)> sizeCallback = [](uint64_t size){}) override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::function<void(uint64_t)> sizeCallback = [](uint64_t size){}) override {
std::function<void(uint64_t)> sizeCallback = [](uint64_t size){}) override
{

Comment on lines 695 to 698
if (_lfsFetch.shouldFetch(pathStr)) {
StringSink s;
try {
_lfsFetch.fetch(blob.get(), pathStr, s, [&s](uint64_t size){ s.s.reserve(size); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why _lfsFetch is needed? Can't you do

Suggested change
if (_lfsFetch.shouldFetch(pathStr)) {
StringSink s;
try {
_lfsFetch.fetch(blob.get(), pathStr, s, [&s](uint64_t size){ s.s.reserve(size); });
if (lfsFetch->shouldFetch(pathStr)) {
StringSink s;
try {
lfsFetch->fetch(blob.get(), pathStr, s, [&s](uint64_t size){ s.s.reserve(size); });

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mimicking if let Some(v) = v {}, but in C++ pointer-like syntax is more idiomatic for optionals


// if authHeader is "", downloadToSink assumes no auth is expected
void downloadToSink(
const std::string & url, const std::string & authHeader, Sink & sink, std::string_view sha256Expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use FileTransfer::download() (the sink variant). That way libfetcher doesn't need to reimplement curl support, and we get stuff like connection reuse.

size_t size; // in bytes
};

struct GitUrl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class looks very similar to ParsedUrl, maybe you can use that instead (in conjuction with fixGitURL() to handle the scp syntax)?

return jArray;
}

std::vector<nlohmann::json> Fetch::fetchUrls(const std::vector<Md> & metadatas) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem, this should use FileTransfer::download().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an API call, I implemented this with FileTransfer::upload and added a "post" flag like there was a "head" flag to make it work

}

void Fetch::fetch(
const git_blob * pointerBlob,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could be simplified by having the caller read the blob and pass it to this function, i.e.

Suggested change
const git_blob * pointerBlob,
std::string contents,

since the caller needs to be able to do that anyway if LFS is not enabled. So the caller becomes something like:

auto contents = std::string((const char *) git_blob_rawcontent(blob.get()), git_blob_rawsize(blob.get()));
lfsFetch->fetch(contents, ..., sink, ...);

That way all the blob-reading code here could be removed.

}
const uint64_t size = obj.at("size");
sizeCallback(size);
downloadToSink(ourl, authHeader, sink, oid); // oid is also the sha256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these downloads be cached? If I understand this correctly, Nix will re-download the same LFS objects on every invocation unless the entire tree has been copied into the store (which generally won't be the case with lazy trees).

We have downloadFile() in tarball.cc that can cache files in the store to avoid repeated downloads.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this does not have the store to be able to put things there and it is much more closely related to git I ended up putting it in the user's cache dir instead

@kip93
Copy link

kip93 commented Jan 10, 2025

Hey! Been back at it for a couple of days now, trying to implement the code reviews. Currently, only the reworks of Fetch::fetchUrls is left, but I might or might not be down a rabbit hole since the FileTransfer::download (nor the FileTransfer::upload) quite do what I need it to. Will let you guys know when it's all done.

kip93 added 3 commits January 10, 2025 16:11
Plus, switched CURLOPT_PROGRESSFUNCTION to CURLOPT_XFERINFOFUNCTION since docs say it's deprecated
@kip93 kip93 requested a review from Ericson2314 as a code owner January 10, 2025 18:19
@kip93
Copy link

kip93 commented Jan 13, 2025

Been testing it, looks like it's working for me. I'd say this is ready for another review.

Comment on lines +17 to +18
// git-lfs metadata about a file
struct Md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// git-lfs metadata about a file
struct Md
/**
* git-lfs pointer
* @see https://github.com/git-lfs/git-lfs/blob/2ef4108/docs/spec.md
*/
struct Pointer

struct Fetch
{
// Reference to the repository
git_repository const * repo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git_repository const * repo;
ref<git_repository> repo;

The lifetime is probably fine, but I'd prefer to be certain.

// git-lfs metadata about a file
struct Md
{
std::string path; // fs path relative to repo root, no ./ prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string path; // fs path relative to repo root, no ./ prefix

path appears to be unused, which makes sense for a content address.
If you do need it somewhere, you could still pass std::pair, or rely on addTrace to add it to any error messages.

Comment on lines +232 to +234
// example resp here:
// {"objects":[{"oid":"f5e02aa71e67f41d79023a128ca35bad86cf7b6656967bfe0884b3a3c4325eaf","size":10000000,"actions":{"download":{"href":"https://gitlab.com/b-camacho/test-lfs.git/gitlab-lfs/objects/f5e02aa71e67f41d79023a128ca35bad86cf7b6656967bfe0884b3a3c4325eaf","header":{"Authorization":"Basic
// Yi1jYW1hY2hvOmV5SjBlWEFpT2lKS1YxUWlMQ0poYkdjaU9pSklVekkxTmlKOS5leUprWVhSaElqcDdJbUZqZEc5eUlqb2lZaTFqWVcxaFkyaHZJbjBzSW1wMGFTSTZJbUptTURZNFpXVTFMVEprWmpVdE5HWm1ZUzFpWWpRMExUSXpNVEV3WVRReU1qWmtaaUlzSW1saGRDSTZNVGN4TkRZeE16ZzBOU3dpYm1KbUlqb3hOekUwTmpFek9EUXdMQ0psZUhBaU9qRTNNVFEyTWpFd05EVjkuZk9yMDNkYjBWSTFXQzFZaTBKRmJUNnJTTHJPZlBwVW9lYllkT0NQZlJ4QQ=="}}},"authenticated":true}]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example may be suitable for a unit test.

} catch (const nlohmann::json::parse_error & e) {
std::stringstream ss;
ss << "response did not parse as json: " << responseString;
throw std::runtime_error(ss.str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw std::runtime_error(ss.str());
throw Error(ss.str());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact with Error you probably don't need the std::stringstream, i.e.

throw Error("response did not parse as json: %s", responseString);

return objects;
} catch (const nlohmann::json::parse_error & e) {
std::stringstream ss;
ss << "response did not parse as json: " << responseString;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a very long string. I've seen reverse proxies dump huge streams of HTML into error responses.
If it's larger than a certain size, perhaps it should only printed in full if we're verbose. You could do this with a separate printMsg(lvlTalkative, "Full response: '%1%'");

const auto md = parseLfsMetadata(std::string(content), std::string(pointerFilePath));
if (md == std::nullopt) {
debug("Skip git-lfs, invalid pointer file");
warn("Encountered a file that should have been a pointer, but wasn't: %s", pointerFilePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer for this and the above error to be an error, so that a potential "outdated" version of Nix won't succeed with the wrong result.

} catch (const nlohmann::json::out_of_range & e) {
std::stringstream ss;
ss << "bad json from /info/lfs/objects/batch: " << obj << " " << e.what();
throw std::runtime_error(ss.str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change all of these to Error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetching Networking with the outside (non-Nix) world, input locking
Projects
Status: 🏁 Review
Development

Successfully merging this pull request may close these issues.

10 participants