-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CI, in line with gitdb #53
Conversation
This updates smmap's CI configuration in ways that are in line with recent updates to gitdb's. In most cases there is no difference in the changes, and the reason for the updates is more to avoid confusing differences than from the value of the changes themselves. In one case, there is a major difference (fetch-depth). - gitpython-developers/gitdb#89 (same) - gitpython-developers/gitdb#90 (same) It's just the project, not dependencies, but otherwise the same. - gitpython-developers/gitdb#92 (opposite) This is the major difference. We don't need more than the tip of the branch in these tests. Keeping the default fetch-depth of 1 by not setting it explicitly avoids giving the impression that the tests here are doing something they are not (and also serves as a speed optimization). - gitpython-developers/gitdb#93 (same)
Thanks a lot, it's much appreciated. I hope that while working with the code in these project, an idea might form on how to merge them into GitPython to avoid having this duplication in the first place. |
Unfortunately, not really, unless I have misunderstood what you're looking to do. I think you're looking for a way to include just the parts of them that GitPython really needs. This is something I might have insights about when I have greater knowledge of how GitPython works, but not at this time. However, if you just want to include them in full in GitPython, of course that could be done. I suspect it would not even be a breaking change to GitPython. (If people are using GitPython and also separately using gitdb or smmap while relying on getting them as indirect dependencies through GitPython, I'd say that is already a bug in how their dependencies are declared.) I'm not advocating this, though. It seems to me that it would not be worthwhile unless it brings further improvements, reductions in code, etc. If this were to be done, then I think much of the discussion of it in gitpython-developers/GitPython#511 remains applicable. The best (or: least bad) way to do it would, I think, be to make To reiterate, I do not advocate this in this form. I think something like this is only worthwhile if it carries other benefits. Right now, there is considerable code duplication across the two CI test workflows in GitPython, which strikes me as a more serious issue because, with GitPython being much more active, its workflows are, and should be, read more often than here, and probably changed more often too. (That can itself be solved, and maybe should, but I'm inclined to put that off until native Windows CI jobs are added, because otherwise the way the deduplication is done might turn out to be totally unsuitable.) If the goal is just to stop having three GitHub repositories, then the GitPython repository could be reorganized into a monorepo that hosts all three packages while having them remain separate on PyPI. It seems to me that this is not really worthwhile by itself either, though, given that all these repositories already exist and have been in use for a while. |
Thanks for the analysis!
This is correct - the idea is to only have to deal with a single repository, the one of GitPython, and retire all other repositories and packages. Given how much time it would take to trim down GitDB (which would probably allow to remove From the options provided, maybe there is one that incurs the smallest cost? Maybe tests could also be kept separate enough to mostly work as is without trying to deduplicate too much of the testing boilerplate? These are stable, so don't cost anything by now. |
From context, I think you mean that what was correct was that I had misunderstood you before. The rest of this comment assumes that. Please let me know if I have misunderstood (again?).
Oh! :) I think (a) splicing I recently found out that a number of GitPython's tests rely on having those Git submodules in the repository. Also, it is not limited to tests of submodule handling. It's probably more than the 10 tests shown there; that CI output is misleading (if one doesn't notice this line) because of the So (e) modifying the GitPython test suite so tests no longer expect those to exist as submodules in the GitPython repository would probably be the bulk of the work. Fortunately, that could be done first, separately, and I think should, since no longer having them as Git submodules is its own benefit, and since this might help in avoiding too much at once (on the same feature branch).
The gitdb readme still suggests using gitdb-speedups. Is that still relevant in any way (to this or otherwise)?
I want to say that, unless the ability to declare only |
For context, if I'd layout a project like GitPython today, I'd definitely keep everything related to it in a single repository, while using subpackages.
I don't even know if these are tested and built as part of CI, and I'd be hesitant to endorse any C-code written by me. Further, the speedup it promises is ridiculous in comparison to how slow all of it is. GitDB really shouldn't be used, neither the python version nor, and particularly so, any C-extensions of it. ❤️! The above reaction is due to the fact that you are saying that (a), (b), (c) and (d) would actually be simple, because it's my belief that (e) can even be postponed indefinitely if needed. After all, there is no issue in keeping submodules to archive repositories. (e) also touches on a much larger issue, representing a possible first step: the lack of isolation of the test suite. Many tests require the parent repository to be available and in a certain state, including the availability of a reflog, from which a lot of the issues in running the tests come from. However, I think it's a major amount of work to fix this and maybe it's not worth it unless it's rewarding for the person doing it in other ways. So in a away, I'd avoid the bulk of the work as I don't see a benefit in removing these submodules unless one is willing to tackle the isolation problem of the entire test-suite, or at least, see it as part of that. It's entirely possible to alter only the submodule tests to work in isolation, and maybe from there start tackling the isolation problem of the test-suite as a whole. But that, to me, is entirely optional (despite being valued, and valuable), in favor of (a), (b), (c) and (d) to reduce the maintenance burden. |
Or perhaps consolidating the repositories as you are proposing would enable someone to come along later and do that project more easily. If/when Also, this is a premature question that you definitely don't have to answer at this time, but do copyright notices from
In contrast, the copyright notices in
Intuitively it feels like this is simply a matter of asking you if you are okay with the
Alternatively, there could be a separate file for such other notices, as many large projects have. Related, though probably independent of license requirements and thus not in any way a substitute of figuring the above out: All the contributors and their contributions will be shown by Git if, instead of (0) doing the reasonable thing and copying the code from
Yes, I think so. Well, or at least not too hard. :)
Yes, the archived status would not be an issue. That's arguably better than having Git submodules to non-archived repositories, because the archived status would make the situation clearer. I do think there is potential or confusion with multiple copies of From time to time I find myself embarrassingly editing the copy of a file in When I was recently looking at the source code of I wonder if a weakened version of (e) might be possible, where |
I think they should be, in order to make it most similar to what is already there.
To me it seems it would be easiest if the sub-packages, which I presume will live in their own directory, keep their own license files. That way one can't go wrong as one would not modify an existing license file. The feasibility of this certainly depends on the structural needs of python submodules though. Also CC @empty.
That's a great point! I'd definitely prefer to take the 'merge-into-their-own-subtree' route. This keeps the git history of both projects pristine (so no rewrites should be done). For this to work, they would be merged into their own subtree. From there, I think it's fine to move them once more, into place, if this is needed to turn them into submodules. Of course, I'd love it if they could just stay in place so people using blame don't see "moved into place" as their only commit message - it's so hard to poke through that wall with
A very valid point, I didn't see that. Maybe that will be a great incentive to at least isolate the submodule tests to not require its parent repository anymore, which should allow to remove the submodules in a future step.
I think that's it! This is easier to accomplish than isolating the submodule tests, and is thus a more reasonable first cleanup step. Nothing would prevent one from taking what I said earlier as a step after that though - I guess I like the idea as it would lead towards learning how to achieve test-isolation. |
I'm actually not sure if you mean they should be conceptually public or that they should be conceptually private, based on this. As you have pointed out, making GitPython no longer have them as external dependencies would be considered a breaking change, due, if I understand your reasoning correctly, to a long-standing expectation that installing GitPython installs those packages and makes them available for use. (Furthermore, it occurs to me that this is an opportunity to make other breaking changes: having only what people can reasonably use included in each module's
If they are to be accessed as |
I thought they should be conceptually public, but with the intent that the status quo doesn't change. Probably I have a wrong understanding of the status quo though.
I thought it's possible to do
Yes, we are definitely aligned here. However, I'd refrain from making it a breaking change because I dare to say that most people won't use
I see, python is dependent on the actual directory structure, and adding Does this make any sense? |
In that case, I would regard it as a new feature, one that is not inherently breaking but affects the public API: GitPython including gitdb and smmap, and thus not having to have them as external dependencies. So if it is not considered a breaking change then the minor version number could be bumped, bringing GitPython to 3.2.0 (unless some other update bumps this minor version number first).
If you mean the original However, it seems to me that there is little reason to do it that way, especially in view of other changes that would be being made anyway. When moving the directories into place to make them Python subpackages of
Most of these changes would, I believe, be fairly fast and easy. (Integrating the documentation is actually the one I feel most apprehensive about.) But with all these kinds of integration changes being done, I think there is no reason to try to force all the old Furthermore, if it is considered valuable to preserve the files other than Python packages/modules and documentation (besides in the repository history), then that can still be done: the repository could contain separate top-level subdirectories |
Yes, a valid point and I can go along with that. I understand that due to all changes needed to properly integrate tests and documentation, one should prefer to not use a 'virtual' root directory as initially proposed by me. The reason I proposed it in the first place was that it could solve the licensing question (i.e. just leave it as is). From the considerations made in the above comment, I also see now that there seems to be no standard or established way of handling submodules as their own, independent projects with all the files that come with it, at least not without further complications. With all that said, it seems that a viable course of action could be the following:
This should make it possible to trace back all files and changes to their original history later on, probably not so much with Maybe there are other ways to achieve that same - for me the only point of importance is the initial state that should be history preserving (i.e. start out with a subtree-merge), everything else really is in your most capable hands. |
gitdb and smmap's current license files could still be put in the subpackage subdirectories (even in addition to elsewhere), i.e., one level lower than they started but still directly associated with the code they came with, if that's something that would help. On the other hand, if the goal is to make gitdb and smmap really a part of GitPython completely, then isolating them in separate directories for license-related reasons would not be the best approach. With The licenses are compatible--more than compatible, the same but just with different copyright lines. I think using a single license file with both notices is enough (or more than enough--intuitively one could combine them into a single line, I just don't want to assume that's really okay because it's arguably not preserving the notice). But if for some reason even that is not enough, then the full text of both could be listed, one after the other, in a single file, or otherwise duplicated in a prominent way at or near the top level of the repository.
Well, should they be independent? If they should be kept as separate projects but just all hosted in the GitPython repository, that's another option, and an alternative to making them Python subpackages. Actually this covers two alternatives:
This is actually something that, once done, I think you could review: when it comes time to do this, I could open a draft PR with just that part of the change, and then do everything else (the integration and so forth) on the PR branch afterwards. It would stick around as a draft PR for as long as those changes took, but that seems okay to me. The PR could then be marked "ready for review" when it's ready for the rest to be reviewed. I emphasize that this is not something I would be ready to do immediately, though, so even if you like this idea, please don't expect such a draft PR around the corner. At minimum and even aside from the issue that I cannot make assurances or commitments about my own availability, I think native Windows CI, and possibly other test cleanup, so we know what tests are currently expected to fail and on what platforms, should come first. (I also want to either try to fix or, at minimum, open an issue for gitpython-developers/GitPython#1650 (comment) / gitpython-developers/GitPython#1650 (comment) so it's not forgotten about, and to present the downsides I'm aware of in all the possible approaches--they all have downsides, though some would be better than the current race-condition situation.) But this is actually good rather than bad, I think, because there is no need to rush this while the plan is still forming and possible alternatives are still being considered. |
Thank you for refreshing my memory - for some reason I completely discarded this initial idea in favour of keeping the files verbatim, but I also see how just combining both will have great advantages when considering the amount of refactoring the code is likely to undergo.
This reveals another assumption I made implicitly: Somehow I assumed that this would be a multi-step process, with the first one bringing in This effectively skips over the This would allow people who depend on
I agree, having a longer-running draft that can receive intermediate feedback is a good way of handling this work.
I understand - if nothing else this thread can serve as reference if this gets tackled in the future. There are many lower-hanging but just as valuable fruit left to be picked :). |
For code outside GitPython that were modified to depend on GitPython instead of gitdb and/or smmap, there would be another impediment to just using I think it's possible that code using the GitPython library is depending on this behavior, such as by wrapping |
That sounds like no change is needed. Despite being a breaking change unless we keep the So I think it's fair to actually remove the |
In that case, would it be okay for |
You are right, it makes perfect sense to start them out as non-public then, let's do that. |
Some bad news: Now that I'm revisiting this topic with fresh eyes, a greater awareness from GitPython #1656 #1659 & gitdb #98 of how the The issue is that the GitPython Even advancing the major version number in GitPython would arguably be insufficient to mitigate this, because a significant amount of code would probably break silently. (I say this because not all projects have robust tests, that is especially so for legacy code that is especially likely to use GitPython, and because an exception that fails to be caught by an intended Possible solutions:
Sorry about the bad news! 😿 |
Thanks so much for the analysis! I consider this good news, as it allows to make more educated decisions and prevent major accidental breakage. Without wanting to push or force a decision, my intuition here lies with option number 3. That way, I seems that For a vision of what would constitute a worthy breaking change that automatically does away with |
This updates
smmap
's CI configuration in ways that are in line with recent updates togitdb
's. In most cases there is no difference in the changes, and the reason for the updates is more to avoid confusing differences than from the value of the changes themselves. In one case, there is a major difference (fetch-depth
), and I think the change there is clarifying, at least when the repositories are compared.gitdb
PRs these changes correspond to:It's just the project, not dependencies, but otherwise the same.
This is the major difference. We don't need more than the tip of the branch in these tests. Keeping the default
fetch-depth
of 1 by not setting it explicitly avoids giving the impression that the tests here are doing something they are not (and also serves as a speed optimization).As noted in gitpython-developers/gitdb#91 (comment), I don't intend to spend a lot of time proposing new features for
gitdb
andsmmap
. Having just done #52, which corresponds to gitpython-developers/gitdb#94, it was bugging me a bit that I had arbitrarily proposed the other changes ingitdb
but not here.