Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/design: add submodule guiding principles #1611

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chooglen
Copy link
Contributor

I'd appreciate it if reviewers focus on whether these principles are comprehensible and whether they are good enough - Are they feasible? Do they make things simple enough for users?

We can grow this list if needed.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (src/config-schema.json)
  • I have added tests to cover my changes

@ilyagr
Copy link
Contributor

ilyagr commented May 18, 2023

I haven't grokked this well enough to confidently review, but it sounds reasonable to me.

One thought I had is that we could make it easy for a submodule to point to a remote which is a local normal (possibly colocated) jj repository. I have at least one usecase of using subprojects in a dotfiles repo to keep track of projects that are unrelated to each other and that I might want to occasionally hack on.

Another observation: one difficulty we might want to consider (unless you already did) is that the submodule can define different remotes than the parent project in its config. Re the re the "Intro to Git Submodules" section, the config file in <superproject-git-directory>/modules/<submodule-name> is a crucial part of the config (the URLs there are usually more important than the one in .gitmodules). It's possible there's some obvious solution that makes this a non-issue.

One possible approach to this difficulty is to require some mapping between the subproject remotes and superproject remotes, though I'd prefer still allowing one extra "local" remote as an exception.

With all of these thoughts, I'm not at all claiming they should have a high priority.

Copy link
Member

@martinvonz martinvonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This was very useful

docs/design/git-submodules.md Outdated Show resolved Hide resolved
docs/design/git-submodules.md Outdated Show resolved Hide resolved

For this to make sense to users, we must make it easy to manage the global
submodules and to reconcile them with the working copy submodules. Examples of
this include: prompting the user to update submodules when the working copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not update submodules automatically when the working copy changes? Actually, which direction do you mean here? Is this about updating the submodule's directory in the working copy or about creating a submodule commit when files have changed in the working copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not update submodules automatically when the working copy changes?

Ah, I considered this but I couldn't find a good way to weave in "Alternatives considered" in a section so short.

When I say "update submodules", I primarily mean "changing the globally configured set of submodules", not "fetching new commits in already-configured submodules". (Seems like I need to make this clearer.) Automatically updating the current set of submodules to match the working copy makes intuitive sense (and we may want to offer this as optional behavior), but there are a lot of corner cases where it doesn't work well:

  • If branches A and B have a disjoint set of submodules, switching between them can delete submodules, causing submodules to need to be cloned over and over. Maybe that's okay if we make this operation append-only, but...
  • Git Submodules have global names that identify them across commits (e.g. to track renames). These names can "conflict" in ways that aren't easily resolved.
    • Updating a URL because the submodule's repo changed hosts. Without automatic updates, this is a non-issue because you'd have cloned the submodule from a more recent commit, which should give you the old commits too. With automatic updating, you'd have to know to ignore the old URL somehow.
    • Two completely different submodules sharing the same "name" at different points in history, e.g. using the name ui to refer to the UI framework and then changing UI frameworks at some point. (That's probably a misconfiguration, but it's not nice to punish the user for the mistake forever.) It's not clear to me how automatic updates could solve this, especially if we don't want to throw away submodules. Maybe this will be tractable if we can teach the submodules how to handle conflicts (like we do with branches), but that doesn't seem necessary in the short term.

docs/design/git-submodules.md Show resolved Hide resolved
@chooglen
Copy link
Contributor Author

One thought I had is that we could make it easy for a submodule to point to a remote which is a local normal (possibly colocated) jj repository. I have at least one usecase of using subprojects in a dotfiles repo to keep track of projects that are unrelated to each other and that I might want to occasionally hack on.

Makes sense, that's a fairly normal use case for Git submodules too - .gitmodules can specify a local path. Users could use that mechanism to point to the Git repo in the jj repo initially, then once we start figuring out what we want "jj-native" submodules to look like, we can provide niceties around pointing to a local jj repo.

Something that we should watch out for (but probably don't need to act upon) is that Git has some security quirks with local submodules. git clone prefers to hardlink local files where possible, so specifying a local path in .gitmodules can copy files in unexpected ways (see an exploit with symlinks).

Another observation: one difficulty we might want to consider (unless you already did) is that the submodule can define different remotes than the parent project in its config. Re the re the "Intro to Git Submodules" section, the config file in /modules/ is a crucial part of the config (the URLs there are usually more important than the one in .gitmodules). It's possible there's some obvious solution that makes this a non-issue.

Yes, Git treats the actual config in the submodule as more important than the one in .gitmodules. I think Git got this mostly right: the URL in .gitmodules is mostly a hint for the superproject/submodule config, and the user should have the ability to fix configuration without having that configuration live in the working copy. Where Git doesn't do a great job is that it can fall back to .gitmodules if the config is missing (mixing config and the working copy is quite fraught, e.g. for security reasons).

Additional remotes besides the default remote are not a problem, I think. The issue I see is where the submodule's default remote is different from the superproject's .gitmodules.

One possible approach to this difficulty is to require some mapping between the subproject remotes and superproject remotes, though I'd prefer still allowing one extra "local" remote as an exception.

Fortunately, the superproject can only suggest one remote for the submodule (submodule.<name>.url in .gitmodules), so I don't think a mapping will be necessary. Some other things I can think of are:

  • Warning if the .gitmodules URL is different from the submodule default remote URL. We'd need this in the corner case of a repo that uses a single submodule name for two completely different repos, though it'll be a bit annoying in the intended case.
  • Completely ignore .gitmodules after initialization, make that behavior very clear, and make it easy for users to only deal with the real submodule remotes. If users don't even consider .gitmodules, then maybe there's no confusion. As with most "shaping how the user sees things" efforts, this is easier said than done, though.

@PhilipMetzger
Copy link
Contributor

I don't have any concerns with the principles posed here.

Although I think that we should alter the prose to be more Jujutsu centered, as it's currently we should do Y because Git does X which isn't good, which is a style I dislike. I'd rather rephrase it as JJ should/will do Y to avoid pitfalls like Git has.

But this stylistic Nit, shouldn't block progress and can be done in a followup if necessary.

@chooglen
Copy link
Contributor Author

Thanks, I like that suggestion. It's a fine line to tread, but I agree that our position should be that this is our own implementation of submodules, and not just a 'better' reimplementation of Git submodules.

This aims to create a consistent mental model about submodules that lets
us reason about and extend them consistently. These are based off
learned lessons from Git, and aims provides an easier way of working
with submodules.
@chooglen chooglen force-pushed the push-nqkstwryvmrn branch from ba772a0 to 1dc68e7 Compare June 12, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants