docs/design: add submodule guiding principles #1611

chooglen · 2023-05-18T00:08:04Z

I'd appreciate it if reviewers focus on whether these principles are comprehensible and whether they are good enough - Are they feasible? Do they make things simple enough for users?

We can grow this list if needed.

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (src/config-schema.json)
I have added tests to cover my changes

ilyagr · 2023-05-18T01:08:39Z

I haven't grokked this well enough to confidently review, but it sounds reasonable to me.

One thought I had is that we could make it easy for a submodule to point to a remote which is a local normal (possibly colocated) jj repository. I have at least one usecase of using subprojects in a dotfiles repo to keep track of projects that are unrelated to each other and that I might want to occasionally hack on.

Another observation: one difficulty we might want to consider (unless you already did) is that the submodule can define different remotes than the parent project in its config. Re the re the "Intro to Git Submodules" section, the config file in <superproject-git-directory>/modules/<submodule-name> is a crucial part of the config (the URLs there are usually more important than the one in .gitmodules). It's possible there's some obvious solution that makes this a non-issue.

One possible approach to this difficulty is to require some mapping between the subproject remotes and superproject remotes, though I'd prefer still allowing one extra "local" remote as an exception.

With all of these thoughts, I'm not at all claiming they should have a high priority.

martinvonz

Thanks! This was very useful

docs/design/git-submodules.md

martinvonz · 2023-05-18T15:31:38Z

docs/design/git-submodules.md

+
+For this to make sense to users, we must make it easy to manage the global
+submodules and to reconcile them with the working copy submodules. Examples of
+this include: prompting the user to update submodules when the working copy


Why not update submodules automatically when the working copy changes? Actually, which direction do you mean here? Is this about updating the submodule's directory in the working copy or about creating a submodule commit when files have changed in the working copy?

Why not update submodules automatically when the working copy changes?

Ah, I considered this but I couldn't find a good way to weave in "Alternatives considered" in a section so short.

When I say "update submodules", I primarily mean "changing the globally configured set of submodules", not "fetching new commits in already-configured submodules". (Seems like I need to make this clearer.) Automatically updating the current set of submodules to match the working copy makes intuitive sense (and we may want to offer this as optional behavior), but there are a lot of corner cases where it doesn't work well:

If branches A and B have a disjoint set of submodules, switching between them can delete submodules, causing submodules to need to be cloned over and over. Maybe that's okay if we make this operation append-only, but...

Git Submodules have global names that identify them across commits (e.g. to track renames). These names can "conflict" in ways that aren't easily resolved.

Updating a URL because the submodule's repo changed hosts. Without automatic updates, this is a non-issue because you'd have cloned the submodule from a more recent commit, which should give you the old commits too. With automatic updating, you'd have to know to ignore the old URL somehow.

Two completely different submodules sharing the same "name" at different points in history, e.g. using the name ui to refer to the UI framework and then changing UI frameworks at some point. (That's probably a misconfiguration, but it's not nice to punish the user for the mistake forever.) It's not clear to me how automatic updates could solve this, especially if we don't want to throw away submodules. Maybe this will be tractable if we can teach the submodules how to handle conflicts (like we do with branches), but that doesn't seem necessary in the short term.

docs/design/git-submodules.md

chooglen · 2023-05-19T16:58:03Z

One thought I had is that we could make it easy for a submodule to point to a remote which is a local normal (possibly colocated) jj repository. I have at least one usecase of using subprojects in a dotfiles repo to keep track of projects that are unrelated to each other and that I might want to occasionally hack on.

Makes sense, that's a fairly normal use case for Git submodules too - .gitmodules can specify a local path. Users could use that mechanism to point to the Git repo in the jj repo initially, then once we start figuring out what we want "jj-native" submodules to look like, we can provide niceties around pointing to a local jj repo.

Something that we should watch out for (but probably don't need to act upon) is that Git has some security quirks with local submodules. git clone prefers to hardlink local files where possible, so specifying a local path in .gitmodules can copy files in unexpected ways (see an exploit with symlinks).

Another observation: one difficulty we might want to consider (unless you already did) is that the submodule can define different remotes than the parent project in its config. Re the re the "Intro to Git Submodules" section, the config file in /modules/ is a crucial part of the config (the URLs there are usually more important than the one in .gitmodules). It's possible there's some obvious solution that makes this a non-issue.

Yes, Git treats the actual config in the submodule as more important than the one in .gitmodules. I think Git got this mostly right: the URL in .gitmodules is mostly a hint for the superproject/submodule config, and the user should have the ability to fix configuration without having that configuration live in the working copy. Where Git doesn't do a great job is that it can fall back to .gitmodules if the config is missing (mixing config and the working copy is quite fraught, e.g. for security reasons).

Additional remotes besides the default remote are not a problem, I think. The issue I see is where the submodule's default remote is different from the superproject's .gitmodules.

One possible approach to this difficulty is to require some mapping between the subproject remotes and superproject remotes, though I'd prefer still allowing one extra "local" remote as an exception.

Fortunately, the superproject can only suggest one remote for the submodule (submodule.<name>.url in .gitmodules), so I don't think a mapping will be necessary. Some other things I can think of are:

Warning if the .gitmodules URL is different from the submodule default remote URL. We'd need this in the corner case of a repo that uses a single submodule name for two completely different repos, though it'll be a bit annoying in the intended case.
Completely ignore .gitmodules after initialization, make that behavior very clear, and make it easy for users to only deal with the real submodule remotes. If users don't even consider .gitmodules, then maybe there's no confusion. As with most "shaping how the user sees things" efforts, this is easier said than done, though.

PhilipMetzger · 2023-05-29T18:24:42Z

I don't have any concerns with the principles posed here.

Although I think that we should alter the prose to be more Jujutsu centered, as it's currently we should do Y because Git does X which isn't good, which is a style I dislike. I'd rather rephrase it as JJ should/will do Y to avoid pitfalls like Git has.

But this stylistic Nit, shouldn't block progress and can be done in a followup if necessary.

chooglen · 2023-05-30T17:14:38Z

Thanks, I like that suggestion. It's a fine line to tread, but I agree that our position should be that this is our own implementation of submodules, and not just a 'better' reimplementation of Git submodules.

This aims to create a consistent mental model about submodules that lets us reason about and extend them consistently. These are based off learned lessons from Git, and aims provides an easier way of working with submodules.

martinvonz reviewed May 18, 2023

View reviewed changes

docs/design: add submodule guiding principles

1dc68e7

This aims to create a consistent mental model about submodules that lets us reason about and extend them consistently. These are based off learned lessons from Git, and aims provides an easier way of working with submodules.

chooglen force-pushed the push-nqkstwryvmrn branch from ba772a0 to 1dc68e7 Compare June 12, 2023 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/design: add submodule guiding principles #1611

docs/design: add submodule guiding principles #1611

chooglen commented May 18, 2023

ilyagr commented May 18, 2023

martinvonz left a comment

martinvonz May 18, 2023

chooglen May 18, 2023

chooglen commented May 19, 2023

PhilipMetzger commented May 29, 2023

chooglen commented May 30, 2023

docs/design: add submodule guiding principles #1611

Are you sure you want to change the base?

docs/design: add submodule guiding principles #1611

Conversation

chooglen commented May 18, 2023

Checklist

ilyagr commented May 18, 2023

martinvonz left a comment

Choose a reason for hiding this comment

martinvonz May 18, 2023

Choose a reason for hiding this comment

chooglen May 18, 2023

Choose a reason for hiding this comment

chooglen commented May 19, 2023

PhilipMetzger commented May 29, 2023

chooglen commented May 30, 2023