Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the workflow: reuniting JuliaImages, Colors development #898

Open
timholy opened this issue Jun 22, 2020 · 10 comments
Open

Fixing the workflow: reuniting JuliaImages, Colors development #898

timholy opened this issue Jun 22, 2020 · 10 comments
Milestone

Comments

@timholy
Copy link
Member

timholy commented Jun 22, 2020

Recently the Pkg devs have decoupled packages from git repositories: it is now possible to have a single git repository that hosts multiple packages. A demo is available in the master branch of https://github.com/timholy/SnoopCompile.jl, which will soon be registered. The former content of SnoopCompile got split out into 3 sub-packages (SnoopCompileCore, SnoopCompileAnalysis, and SnoopCompileBot) and SnoopCompile itself is now a meta-package that loads all 3. Crucially, a user can say using SnoopCompileCore and only get that one package, nothing else is loaded. You can inspect the directories and files there to see how all this is organized.

I believe that this new capability could significantly diminish some of the annoyances of developing JuliaImages. Currently, if we make a breaking change in, say, FixedPointNumbers then these changes have to propagate FixedPointNumbers->ColorTypes->(Colors,ColorVectorSpace)->ImageCore->(ImageAxes, ImageMeta, ImageFiltering, ImageContrastAdjustment...)->Images->(ImageSegmentation, ImageFeatures, docs). That's a long dependency chain, and with the latency of running CI, merging (which triggers another CI run), registering (and waiting for the registration to merge), it's really easy to blow an entire weekend just shepherding a single change all the way up the hierarchy. Perhaps no one besides @johnnychen94 has felt this acutely as I have, but I don't want to even think about how much time I've lost to this workflow problem. Even in the best of cases---where everything is ready locally and it's really just a question of submission---the fact that CI limits your number of jobs, and that each merge-to-master triggers more CI, means that the CI queue ends up getting hugely clogged. By the middle of the weekend you're choosing between leaving the README badges with a "broken" tag because you've canceled a bunch of redundant CI jobs, or waiting a couple of hours to see whether your next 3-line PR passes tests or not.

In contrast, if you have all packages in just one or two git repositories, you prepare the entire set of changes locally, get everything working, and submit it all in one or two PRs. Once you hit merge, the goal would be a single, simultaneous registration event that brings the whole lot to users desktops. This process is not yet entirely smooth: Pkg errors due to [compat] requirements that have not yet been registered, because they're appearing in the same PR (see JuliaLang/Pkg.jl#1874). But this issue seems fixable, and part of the purpose in posting this proposal now is to gauge whether there's enough interest in this to make it worth my while to spend some time fixing it.

Another advantage is that we would automatically run all of JuliaImages tests for each change, being therefore a bit more confident that we're not inadvertently breaking something higher in the hierarchy. This aspect, however, has a corresponding pair of disadvantages: (1) CI times will be long even for trivial changes, and (2) even occasional failures will be problematic because they'd affect the entire run, so we'd have to take a hard look at our tests and make them deterministic.

A final issue (which I think is mostly an advantage) is that it would end any debate about whether documentation should be hosted in each individual package repository or collectively at JuliaImages. In this case they would effectively be the same.

To me the advantages seem to significantly outweigh the disadvantages. I therefore propose that much of JuliaImages move back to a single git repository, Images.jl, but maintain the current package structure. Likewise, I'd propose that ColorTypes and ColorVectorSpace merge into the Colors repository. That way we'd have just two repos, Colors and Images, plus several dependencies in JuliaArrays and at least one (FixedPointNumbers) in JuliaMath. So in practice it would still be a fair number of repos, but far fewer than we have now.

If we decide to do this, then a first priority would be to merge any outstanding PRs. This will undoubtedly take a while, so I don't expect this change to happen immediately, but I think it's time now to have the discussion about whether this is something we want to do. I'd appreciate your thoughts!

CC @kimikage since he works on the colors world but doesn't follow this package.

@kimikage
Copy link

"In general", I prefer the loose coupling and modularization.

However, I don't have too many packages which I develop in parallel, so currently I don't feel inconvenience so much. (I'm more concerned about the long time for the consensus building rather than the CI, but it is off-topic.) Therefore, I respect @timholy's opinions.

@johnnychen94
Copy link
Member

johnnychen94 commented Jun 22, 2020

Although the whole Julia community is set upon mutual trust, permission control is still a potential and important issue of one giant repository.

The dependency chain from Colors -> ImageCore and FixedPointNumber -> ImageCore aren't combined even if we took this path, so you'll still not get a very smooth experience. As for sub packages under JuliaImages, most of them are at the same level in the dependency chain, and thus it won't cause much trouble to develop them in parallel (unless we move codes around).

I was planning to explore the dev workflow based on git's submodule functionality and write necessary dev-tools for it. I'm not sure which one is a better solution until then. Since Images.jl is quite a large package to experiment with, I propose to only work on Colors first and see how things going.

@Tokazama
Copy link

For what it's worth the distribution of image packages makes it really hard to understand just how breaking a proposed change is. I'm working on a fork of ImageCore and I'm still trying to figure how much I'll be breaking.

@timholy
Copy link
Member Author

timholy commented Jun 25, 2020

Even when things are at the same level of the hierarchy, these "move foo from PkgA to PkgB" are tough to handle correctly unless they're all part of a bigger thing.

For those of you who may not be following, JuliaLang/Pkg.jl#1874 has ongoing vigorous discussion. The current trend seems to be in favor of implementing support for "tradiitional" single packages but adding a new capability allowing you to load specific submodules. E.g. using Images@ImageCore rather than using ImageCore where ImageCore is a standalone package.

@johnnychen94
Copy link
Member

For the record, unless JuliaLang/Pkg.jl#2005 is supported, simply moving the code around doesn't improve user experiences much, so with currently fewer maintainers than we need, this issue is not at high priority.

@timholy
Copy link
Member Author

timholy commented Jul 25, 2021

Yes, agreed. I've been trying it out in SnoopCompile, and while it fixes some issues it causes problems for others. The releases get more complicated: timholy/SnoopCompile.jl@1abf0a2. Shall we close this?

@johnnychen94
Copy link
Member

I didn't use this centralized workflow in any codebase that I watch so I can be wrong here. There are three advantages of this workflow that I can foresee:

  • it increases the code discoverability if all JuliaImages packages are organized into one big repo
  • no need to choose between the central documentation (juliaimages.org) and the scattered ones (ImageCore, ImageFiltering...)
  • bump a breaking version becomes more smooth: you don't need to open a bunch of CompatHelper PR and fix compatibility issues here and there because you simply just use the dev version of upstream dependencies.

So except for the release toolchain lack, it can solve quite a few undecided issues.

There can also be drawbacks:

  • it's easier to build a closed ecosystem because by doing this, we basically say that "Okay ImageCore is used internally in Images.jl big repo, so as long as tests in other parts of my Images.jl big repo pass, it is good to go." and because we're always using the dev version of ImageCore to test, it can become harder to realize the breakage until we release them all.
  • the version lock due to incompatibility could become more frequent: because we can dev and make changes to our dependencies, it's no longer foreign, and compatibility assumptions can go less restrictive as before.
  • it slightly goes against the "... immediately after merging the PR, tag a release." continuous delivery workflow that most of the Julia community uses.

The above are all from maintainers' and contributors' perspectives, I'm more interested in the opt-in (sub)package mechanism which improves the users' experiences. Thus I still can live with our current workflow as we're quite familiar with it.


Maybe @findmyway has some insights to this as he already deploys this workflow in ReinforcementLearning

@findmyway
Copy link

findmyway commented Jul 26, 2021

  • it's easier to build a closed ecosystem because by doing this, we basically say that "Okay ImageCore is used internally in Images.jl big repo, so as long as tests in other parts of my Images.jl big repo pass, it is good to go." and because we're always using the dev version of ImageCore to test, it can become harder to realize the breakage until we release them all.

You can set up several different CI.

  • the version lock due to incompatibility could become more frequent: because we can dev and make changes to our dependencies, it's no longer foreign, and compatibility assumptions can go less restrictive as before.

I'm not sure it stands. Each subpackage should have a clear boundary. We still need to obey the SemVer when making modifications and releasing new versions.

Similar to my answer in the above one, this should be the same as before.

One drawback is like what Tim mentioned above. But I think it only happens when tagging new major/minor versions. This is not that frequent so I think it is somewhat acceptable.

Another big issue for me is the version of the documentation. Let's say we put all the documentation into one place. Then readers can only select one version of the main package:

image

And they can't select an old version of some subpackages (at least I'm not sure how to implement it). Obviously, this is not an issue in separated packages.

@johnnychen94
Copy link
Member

I think we should go this way, before we proceed this, there are some decisions (and associated issues):

@johnnychen94
Copy link
Member

johnnychen94 commented May 21, 2022

As a first step, I've managed to build the new Images monorepo using https://gist.github.com/johnnychen94/3f260750b74e46b52ed0c5a62534140e and a preview is available in https://github.com/johnnychen94/Images.jl

The current folder structure is

.
├── README.md
└── packages
    ├── HistogramThresholding
    ├── ImageAxes
    ├── ImageBase
    ├── ...
    └── QRCode

I've included almost all working repositories in JuliaImages, except for:

If you don't want particular packages included in the monorepo, please just let me know.

We might want to make Colors and ImageIO mono repos in the future if this proves to work well.


Steps to migrate:

  • migrate the entire git history to subdirectory packages/NAME
    • I'll add a prefix for each commit to indicate where it comes from.
    • images, .DS_Store are excluded from the history -- this compresses the entire repository from 100Mb to 23Mb. I'll add them back when build the documentation
  • set up unit test CI
  • migrate documentation and benchmark
  • push to monorepo branch in this repo and play with it for a while
  • reset the entire master history in this repo with the monorepo branch
  • transfer all issues back to this repo and archive old repos

When the migration finishes, developers need to reclone the Images repository because the history will be completely rewritten using some magical git tricks git filter-repo.

cc @all that I know is still actively developing JuliaImages: @zygmuntszpak @adrhill @juliohm @timholy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants