Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Absorb in constructor #294

Open
jaimergp opened this issue Nov 2, 2023 · 8 comments
Open

Absorb in constructor #294

jaimergp opened this issue Nov 2, 2023 · 8 comments

Comments

@jaimergp
Copy link
Collaborator

jaimergp commented Nov 2, 2023

Idea

Merge conda-pack with constructor in a way that constructor can output conda-pack files as one of the possible output artifacts.

Why

  • conda-pack shares many similarities with constructor, the main one being that both can take an environment and generate and artifact that allows its redistribution
  • constructor also supports solving a new environment on the spot without having to create it first.
  • Maintenance wise, it would be easier to have the common logic live in a single place. The environment classes in conda-pack could help a lot in constructor, whose logic is a bit more naive.

Expected outcome

  1. Add support for conda-pack outputs in constructor, with conda-pack being imported as a regular 3rd party library.
  2. Absorb conda-pack codebase in constructor, adjusting things as necessary. Parts of constructor might be replaced with conda-pack's, and some conda-pack stuff will be removed if not needed.
  3. Migrate documentation as necessary over to constructor.
  4. Release a new conda-pack version with a warning saying that it's the last one, and future development will happen in constructor.
  5. Migrate issues as necessary to constructor and archive this repository and the corresponding feedstocks.

Alternatives

  • Do not merge projects, but create a new one that supersedes all of them with rewrite-from-scratch strategy.
  • Consider conda-docker outputs as part of the migration too.

cc @xhochy @conda/constructor

@mcg1969
Copy link
Contributor

mcg1969 commented Nov 2, 2023

As an admittedly non-active contributor to both projects I'm not sure how I feel about this. They were constructed for different purposes, and output different types of environment assets. It's not clear to me how much savings you're really going to achieve, whether measured in lines of code or human hours.

But as someone who wants to see both projects fully supported, if it is clear that this move would make it easier to achieve that, I'd get behind it.

My final concern is this: "some conda-pack stuff will be removed if not needed." I just want to make sure that is referring only to redundant code, and not a suggestion to remove features of conda-pack. conda-pack tends to be more frequently integrated into unattended workflows, and it may not be fully appreciated what dependencies this or that feature of the tool are out there.

@jaimergp
Copy link
Collaborator Author

jaimergp commented Nov 2, 2023

Thanks for the comments @mcg1969. Right now I want to see how people feel about it, this is not going to be rushed or anything. My main idea is that since both tools allow redistribution of environments in one way or another, it would make sense to join forces instead of reimplementing some core logic (e.g. what if we want to conda-pack a lockfile). If I had to rewrite both constructor and conda-pack, I think most people would come up with a reasonable common infra needed for both (e.g. the concept of a solved environment at the level of records and files).

I just want to make sure that is referring only to redundant code,

Definitely this. I don't intend to remove features just because I don't use them 😬

@dholth
Copy link

dholth commented Nov 2, 2023

I like the idea.

@xhochy
Copy link
Collaborator

xhochy commented Dec 23, 2023

This sounds sensible. Marketing-wose conda-pack is the better name IMO.

@jezdez
Copy link
Member

jezdez commented Jan 4, 2024

I like it as well, and would suggest sticking with constructor given the naming overlap with Hashicorps' Packer and Nomad Pack.

@dbast
Copy link
Member

dbast commented Jan 5, 2024

Used conda-pack a while ago on daily basis with pyspark and called it even multiple times a day to distribute my current DataScience conda environment to all the Spark workers on ephemeral clusters.

For me constructor and conda-pack are two complete different things:

  • users: constructor is used by experts to create self-extracting installers, conda-pack is used by end-users (Data Engineers / Scientists) to distribute environments on clusters or archive them for later usage
  • frequency: constructor is used once in a while to create an artifact, conda-pack is used on daily basis embedded into normal work
  • qa: the artifacts/installers created by constructor are normally qa tested before they are delivered, it does not matter if constructor has some ruf edges as the quality of the produced artifact matters not of the tool itself, while conda-pack is used more frequently with different needs towards UX quality.
  • artifact: constructor produces self extracting artifacts and thus has to deal with eula, target folder selection, permissions, conda init, menu entries, while conda-pack produces archives like tarballs, zip files or mountable squashfs files (no need to deal with eula, target folder selection, permissions, conda init)
  • artifact internals: constructor internally puts conda packages (.conda or .tar.bz2 files) + conda-standalone + scripts into the artifact, while conda-pack is just compressing an entire environment (including e.g. pip installed packages) into an archive (none of the mentioned things for constructor are added here)
  • abstraction: constructor is an abstraction on top of nsis, macos pkgtool, .sh script logic and providing the posibility via one .yaml definition to produce an artifact for different platforms (windows, macos, linux), while conda-pack just compresses an existing environment (+some prefix handling) with no need for all that kind of platform dependent abstraction
  • configuration: constructor is configured via yaml file (supporting platform selectors), conda-pack is a command with args
  • numbers: constructor currently has 472,738, while conda-pack has 2,364,011 downloads on conda-forge

Those differences in scope, goals, user base of both tools even go to the point, that I wonder why the @conda/constructor team was added as CODEOWNERS to the conda-pack codebase https://github.com/conda/conda-pack/blob/main/.github/CODEOWNERS (I wasn't fast enough to interfere at that time when that happened :/ ).

tldr; I would not do that merge/absorb, even if there are some similarities in the codebase as both tools serve a different purpose and userbase.

@jaimergp
Copy link
Collaborator Author

jaimergp commented Jan 8, 2024

Thanks for the detailed comparison, @dbast! I agree in some points and it is useful to have these side by side. I'll answer below, one by one:

users: constructor is used by experts to create self-extracting installers, conda-pack is used by end-users (Data Engineers / Scientists) to distribute environments on clusters or archive them for later usage

I used constructor shell scripts for the same HPC-related purpose because it was the tool I knew. Not saying it's the best for that, but it works. Maybe conda-pack has more specific options, but that's not a blocker.

frequency: constructor is used once in a while to create an artifact, conda-pack is used on daily basis embedded into normal work

constructor installers are used very often in CI pipelines. Some teams produce nightly installers of their applications.

qa: the artifacts/installers created by constructor are normally qa tested before they are delivered, it does not matter if constructor has some ruf edges as the quality of the produced artifact matters not of the tool itself, while conda-pack is used more frequently with different needs towards UX quality.

I hope we can achieve better UX in constructor by working together in a single tool instead of separate codebases.

artifact: constructor produces self extracting artifacts and thus has to deal with eula, target folder selection, permissions, conda init, menu entries, while conda-pack produces archives like tarballs, zip files or mountable squashfs files (no need to deal with eula, target folder selection, permissions, conda init)

Yes, in a way constructor has a superset of functionalities. EULA is one of them, and it's not required. However, I wouldn't say that conda-pack artifacts are necessarily exempt from EULA requirements (not to be confused with license redistribution).

artifact internals: constructor internally puts conda packages (.conda or .tar.bz2 files) + conda-standalone + scripts into the artifact, while conda-pack is just compressing an entire environment (including e.g. pip installed packages) into an archive (none of the mentioned things for constructor are added here)
abstraction: constructor is an abstraction on top of nsis, macos pkgtool, .sh script logic and providing the posibility via one .yaml definition to produce an artifact for different platforms (windows, macos, linux), while conda-pack just compresses an existing environment (+some prefix handling) with no need for all that kind of platform dependent abstraction

Correct. It's just a different output format.

numbers: constructor currently has 472,738, while conda-pack has 2,364,011 downloads on conda-forge

That's a very fair point I hadn't considered.


I'm not pushing hard for this, btw. I am just saying that there's an overlap of useful abstractions in the codebase that would benefit several projects (conda-docker is another one). My ideal project would be a conda bundle subcommand that deals with all the complexities.

Maybe a different compromise here is to expose better conda environment abstractions in a common library or something. Or just do nothing about the codebase but have better documentation and tutorials about the intended purpose of these similar but different tools.

@dbast
Copy link
Member

dbast commented Jan 8, 2024

While you can distribute on a HPC via constructor by running a command/script on all machines, the pyspark case is different as you need a yarn compatible .tar.gz or zip file of the environment (not a self extracting script) so that you can give it as archive to yarn to predeploy when the spark session / workers are setup (yarn does the extraction of the archive).

So conda-pack is compressing an environment that pre-exists almost like tar -czf $archive.tar.gz ~/miniconda3/envs/$ENVNAME, while constructor puts something together into an installer that actually later creates an environment .... so one starts with an environment while the other ends with an environment. different usecase, different approach, different tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants