Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to keep a license file in sync with a partial-repository mirror #125

Open
mattmccutchen opened this issue Oct 4, 2024 · 4 comments

Comments

@mattmccutchen
Copy link
Collaborator

Suppose I want to vendor a subdirectory of an upstream repository, and the upstream license requires me to include a copy of the license in my repository (as is typical for open-source licenses), but the license in the upstream repository is outside the subdirectory I vendored. Then I have to add the license separately. This would be easy to do with a separate Braid mirror. Here's an example of a .braids.json file where I took this approach for my mirrors of the https://github.com/meteor/react-packages repository.

But now I'm responsible to ensure that if the upstream license ever changes, my copy of the license stays in sync with the version of the code in my repository. Normally, that would mean keeping the content and license mirrors at the same upstream revision. (I should probably read the new license too and ensure it's still acceptable to me, but it's unclear whether there's any reasonable way Braid can help with that.) If I run braid update on all mirrors, that should keep the two mirrors in sync, barring a race condition in which the upstream repository is updated while braid update is running. However, if I update the content mirror individually, it would be easy to forget to update the license mirror.

Ideally, Braid would have functionality that could be applied to solve this problem in a robust way. Granted, the problem is probably very rare and upstreams are likely to be forgiving of this kind of mistake, but I guess I'm a stickler.

The obvious design would be to have a mirror that has a single upstream repository and revision but can map multiple upstream paths to different downstream paths. I'll tentatively call this a "multi-component mirror", and it might have uses beyond license compliance. While the idea is simple, nailing down all the details of implementation and command-line usage would add a lot of complexity that we'd have to maintain indefinitely, so I don't want to do it unless there's a lot more evidence of demand from users. (The usual refrain.)

If you want a solution to the license sync problem or you have ideas, please add your thumbs-up or comment. For now, I'm just filing an issue to document the problem, and users can cope with the status quo.

@realityforge
Copy link
Collaborator

While I have never needed this for license compliance, very occasionally I have wanted to braid in multiple directories from a single repository (often this is just to exclude some heavy unneeded directories from upstream). I have also wanted to include multiple single files from within a directory. This may be a pretty obscure requirement and I am not sure others would necessarily share similar requirements. (Some of the reasons I want to do this is many of these files are sized 50M+)

i.e. The conceptual thing I want to do is include dirs such as "Config" and "Source" but exclude "Resource", "Content" or "Examples". I guess in the most general form I want to braid in files based in file patterns similar to those that appear in .gitignore files.

Actually thinking about it. What I almost exactly want is perforce views for mapping another repository in. See https://www.perforce.com/manuals/cmdref/Content/CmdRef/views.html ... but that is complex ;-)

I don't know if this requirements are even remotely worth the effort to implement ;-)

@mattmccutchen
Copy link
Collaborator Author

I guess in the most general form I want to braid in files based in file patterns similar to those that appear in .gitignore files.

We have #120 for that style of filtering, where the upstream repository (or a subdirectory thereof) maps to a single downstream subdirectory but some files are excluded.

The next level of complexity, which I'm tentatively calling a "multi-component mirror", is vendoring a fixed list of upstream subdirectories or files (no wildcards) but allowing them to be rearranged compared to upstream. To illustrate what I mean, in the license compliance example, I had:

  • upstream/packages/react-meteor-data -> downstream/packages/react-meteor-data
  • upstream/LICENSE.txt -> downstream/LICENSES/LICENSE.meteor-react-packages

I used separate Braid mirrors; a multi-component mirror would have ruled out license skew. I was using Meteor, which expects the packages/react-meteor-data directory to be in that exact location. So I couldn't just vendor the whole upstream repository into a subdirectory unless I found some way to override where Meteor looks for the package, and even if I did, the resulting directory layout would be a bit uglier for developers of my project. How common do we expect this kind of situation to be, and how strong an argument is it for offering a better solution than the status quo? I don't know.

And then Perforce views support various extra things beyond multi-component mirrors.

Peter, do you actually have a use case at this time for anything beyond filtering, or was it just a "that would be cool" sentiment?

For the licensing documentation, the minimum I needed was an issue I could cite that stated the problem without assuming a particular solution. If you'd like to start accumulating use cases for multi-component mirrors, we can make a separate issue for that, or if you're confident that multi-component mirrors are the right solution for the licensing problem (I'm not at this point), we can repurpose this issue.

@realityforge
Copy link
Collaborator

If multi-component mirrors were present I would have used them. Do I think my use cases would have been described as "good practices" - probably not. (I am using perforce a bit of late and tend to replicate some workflows from there)

In most cases I own or control the braided repository as well as the local repository so I can always re-arrange the braided repository to suite the downstream repository. If #120 was present then it would be perfect for a majority of cases.

Also: I am using git these days for a lot of large assets and tend to use a combination of sparse checkouts, partial clones and occasionally LFS to manage this. I have avoided braiding files from these repositories and instead just copy the select files I want into the downstream repository manually and then periodically re-copy them across. I guess my brain leaped to: "If the braided repository was a sparse checkout with partial clone then I could braid them in"

So in short: I don't have a significant motivation for it. If anything I have motivating use case for #120 and potentially motivating use case from converting pull/update/add to sparse/filtered clones

@mattmccutchen
Copy link
Collaborator Author

FWIW, I thought some more about the original problem of license compliance for Braid mirrors, and I'm coming to think the right solution is to teach popular license compliance tools to analyze Braid mirrors, not to add any licensing-specific functionality to Braid. For example, ScanCode already supports analyzing many kinds of package manifest files. It might be reasonably straightforward to add support for identifying partial-repository mirrors in .braids.json and scanning their upstream repositories for license files, using existing infrastructure in ScanCode. This approach would naturally handle edge cases like detecting if a license file was renamed in the upstream repository between two versions; there's no way I imagine us adding that kind of heuristic to Braid.

That's the hope, anyway. I figured I'd experiment with ScanCode a bit to better understand what integration with Braid might look like, but it appears that getting it set up to actually retrieve licenses of dependencies is more work than I want to do on a whim. So we can let "enhance existing license compliance tools" be the plan of record for users who want high confidence of getting it right, and in the meantime, users can get by with ad-hoc management of extra mirrors for the license files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants