Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add podman support alongside docker for dev #1135

Merged
merged 1 commit into from
Nov 4, 2021

Conversation

willcohen
Copy link
Contributor

No description provided.

@willcohen
Copy link
Contributor Author

Note that this gets polis running, but as mentioned in my followup to #1060, it doesn't totally solve the dev workflow questions I have with the repl.

Additionally, this still uses docker-compose, and just switches out docker for podman internally -- it turns out that podman is now compatible with docker-compose which is nice, so kubernetes yaml is not immediately necessary for a basic development workflow without docker desktop!

@willcohen willcohen changed the title Add podman support alongside docker for dev [WIP] Add podman support alongside docker for dev Sep 2, 2021
@willcohen willcohen closed this Sep 2, 2021
@willcohen willcohen reopened this Sep 2, 2021
@willcohen
Copy link
Contributor Author

Sorry for the churn. Old instructions required a bunch of virtualbox etc etc, but apparently as of two days ago it uses qemu under the hood which dramatically simplifies things. Eventually all the socket business should get streamlined under the hood too, but for now this works.

@willcohen
Copy link
Contributor Author

Note that the revised simplified instructions that sidestep virtualbox successfully build the images, but the use of docker-compose runs into a new problem, noted upstream: containers/podman#11413

@willcohen
Copy link
Contributor Author

willcohen commented Oct 14, 2021

Monthly-ish update (@metasoarous cc'd too, figure this is a better place for the conversation than deep in #1060 ):

Podman may eventually be the preferred macOS outcome here, but it requires upstream changes to QEMU. After a bit of a deeper dive, all containerization solutions on macOS (including docker) all operate by running a Linux VM, which get a nice performance boost via macOS's Hypervisor framework et al, but to @metasoarous's point it's still not quite as clean as normal chroot. Eventually QEMU will probably need to support better file i/o on macOS (I bit the bullet and am trying to push that forward at https://lists.nongnu.org/archive/html/qemu-devel/2021-10/msg03006.html) for this to work better, but for now the best working solution on Mac, with admittedly non-ideal file I/O performance is lima, which includes the docker-compose compatible interface nerdctl.

Long story short, for now on Mac a functional development workflow which (generally) seems to work on my 2017 MacBook Pro:

  1. Install lima ($ brew install lima) and ($ limactl start).
  2. Use nerdctl or other tools to use the existing docker compose files ($ lima nerdctl compose build etc.).

Once QEMU sorts its stuff out and Podman picks a path forward, I'll update this PR to see what remains.

The only pending issue I currently see ultimately needing to sit in this PR might be a decision around ports. Podman / Lima unlike Docker tend to encourage users to run containers rootless, which generally seems like a reasonable security measure, but means that things like maildev on port 25 run into permission errors. Once I get a little further into this I may ask about some options (perhaps a second compose yml file) that keeps ports above 1024.

@metasoarous
Copy link
Member

This is great @willcohen! Thanks so much for continuing to push on this.

Using nerdctl and lima seems like a fine solution for now. I don't really think file I/O performance is really too big of a deal, but please correct me if I'm missing something.

My only question with the PR as it stands is why all of the image names have been modified to point to docker.io/*. Is this necessary? What does this accomplish, and will it have an effect on folks trying to build locally vs use pre-built images we push up to dockerhub? Regarding dockerhub, we still need to get our CI/build system going for that; Will this be a pre-req?

Thanks again!

@willcohen
Copy link
Contributor Author

It's a change that's no longer relevant, I just haven't updated this PR yet to undo that until I get a fully working environment. Pre v3.4 podman insisted on fully qualified domain names to avoid spoofing registries but backed off of that once they did a wider release since it would invalidate so many Dockerfiles.

@metasoarous
Copy link
Member

Got it; Thanks for the clarification.

For now, let's go ahead and revert those changes, but I'll add an issue to consider whether we want to fully qualify in the future. I'll need to spend some time thinking through (and researching) all the implications there before pulling the trigger on that, but it may end up being a good idea anyway if not necessary.

Thanks again!

@willcohen
Copy link
Contributor Author

I actually stand partially corrected -- I wanted to double check what I said, but it turns out I may have spoke too soon. Fedora (which is podman-centric even though it's supposed to be pretty docker compliant) still chokes on the non-qualified domain names, and nerdctl (which still does run containerd, so it's not that far off from docker vanilla) is getting tripped up on file-server's lines that include FROM compdem/polis-client-admin..., since there's nothing that actually exists in any registry under that namespace and image. I'm trying various permutations of localhost and docker.io etc to see if something works across all of them for now, but I need to hold on my plan stated above until I figure it out. Apologies again.

@metasoarous
Copy link
Member

No apologies necessary; Thanks again for your persistence!

@metasoarous
Copy link
Member

Regarding the maildev port; I don't know how strict the port requirements are there. It may not be a big deal to move that if it's just for the dev environment. maildev is just for dev anyway, so whatever gets that piece working is fine, as long as it's working.

Regarding the file-server business, does this get fixed if you make sure the images are built with the unqualified name before trying to run? Please let me know if I'm misunderstanding what nerdtl is actually doing.

Since there's still friction here, thought I'd mention that I've seen some talk of Rancher being used to replace Docker Desktop, but it may be more restricted in capacity to building, and not be as appropriate as a dev runtime, like we're looking for here.

Thanks again

@willcohen
Copy link
Contributor Author

willcohen commented Oct 14, 2021

I saw that too! Rancher Desktop uses lima underneath, so solving it for lima should solve it for Rancher.

Podman (via Fedora, so no Mac in the picture) is able to run everything via docker-compose as long as everything is fully-qualified with docker.io. (While there's supposed to be ways to bypass the qualification requirement I've still been hitting interactive prompts, which leads to docker-compose hitting errors since it's not operating interactively.):

$ sudo DOCKER_HOST=unix:///run/podman/podman.sock docker-compose build
...
$ sudo DOCKER_HOST=unix:///run/podman/podman.sock docker-compose -f docker-compose.yml -f docker-compose.dev.yml up --detach
WARNING: The GIT_HASH variable is not set. Defaulting to a blank string.
Creating network "polis_default" with the default driver
Creating network "polis_polis-net" with the default driver
Creating polis-client-participation ... done
Creating polis_maildev_1            ... done
Creating polis-client-report        ... done
Creating polis-postgres             ... done
Creating polis-client-admin         ... done
Creating polis-file-server          ... done
Creating polis-math                 ... done
Creating polis-server               ... done
Creating polis-nginx-proxy          ... done

For nerdctl if using docker.io as the prefix across the board, this error happens when building:

$ lima nerdctl compose build
...
unpacking docker.io/compdem/polis-client-report:dev (sha256:6736e3ffeeabe9c2b08efbc195fd4e08606b1433d3d5a632b192c070ffbf4d0f)...done
INFO[0931] Building image docker.io/compdem/polis-file-server:dev 
[+] Building 0.6s (6/6) FINISHED                                                                                                                                                             
 => [internal] load build definition from Dockerfile                                                                                                                                    0.1s
 => => transferring dockerfile: 652B                                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                                       0.0s
 => => transferring context: 2B                                                                                                                                                         0.0s
 => CANCELED [internal] load metadata for docker.io/library/node:16.9.0-alpine                                                                                                          0.4s
 => ERROR [internal] load metadata for docker.io/compdem/polis-client-participation:dev                                                                                                 0.4s
 => CANCELED [internal] load metadata for docker.io/compdem/polis-client-report:dev                                                                                                     0.4s
 => ERROR [internal] load metadata for docker.io/compdem/polis-client-admin:dev                                                                                                         0.4s
------
 > [internal] load metadata for docker.io/compdem/polis-client-participation:dev:
------
------
 > [internal] load metadata for docker.io/compdem/polis-client-admin:dev:
------
Dockerfile:3
--------------------
   1 |     ARG TAG=dev
   2 |     
   3 | >>> FROM docker.io/compdem/polis-client-admin:${TAG}          as admin
   4 |     FROM docker.io/compdem/polis-client-participation:${TAG}  as participation
   5 |     FROM docker.io/compdem/polis-client-report:${TAG}         as report
--------------------
error: failed to solve: docker.io/compdem/polis-client-admin:dev: pull access denied, repository does not exist or may require authorization: authorization status: 401: authorization failed
FATA[0001] unrecognized image format                    
FATA[0932] error while building image docker.io/compdem/polis-file-server:dev: exit status 1 

If for nerdctl unqualified image names are used in the compose YAML and the file-server Dockerfile for the three failing images in question, the error changes and seems to prepend docker.io sometimes and not other times.

$ lima nerdctl compose build
...
unpacking docker.io/compdem/polis-client-report:dev (sha256:6736e3ffeeabe9c2b08efbc195fd4e08606b1433d3d5a632b192c070ffbf4d0f)...done
INFO[0127] Building image docker.io/compdem/polis-file-server:dev 
[+] Building 0.7s (6/6) FINISHED                                                                                                                                                             
 => [internal] load build definition from Dockerfile                                                                                                                                    0.1s
 => => transferring dockerfile: 622B                                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                                       0.1s
 => => transferring context: 2B                                                                                                                                                         0.0s
 => CANCELED [internal] load metadata for docker.io/library/node:16.9.0-alpine                                                                                                          0.5s
 => ERROR [internal] load metadata for docker.io/compdem/polis-client-participation:dev                                                                                                 0.4s
 => ERROR [internal] load metadata for docker.io/compdem/polis-client-admin:dev                                                                                                         0.4s
 => ERROR [internal] load metadata for docker.io/compdem/polis-client-report:dev                                                                                                        0.4s
------
 > [internal] load metadata for docker.io/compdem/polis-client-participation:dev:
------
------
 > [internal] load metadata for docker.io/compdem/polis-client-admin:dev:
------
------
 > [internal] load metadata for docker.io/compdem/polis-client-report:dev:
------
Dockerfile:3
--------------------
   1 |     ARG TAG=dev
   2 |     
   3 | >>> FROM compdem/polis-client-admin:${TAG}          as admin
   4 |     FROM compdem/polis-client-participation:${TAG}  as participation
   5 |     FROM compdem/polis-client-report:${TAG}         as report
--------------------
error: failed to solve: compdem/polis-client-admin:dev: pull access denied, repository does not exist or may require authorization: authorization status: 401: authorization failed
FATA[0001] unrecognized image format                    
FATA[0128] error while building image docker.io/compdem/polis-file-server:dev: exit status 1 

What images exist on Docker Hub? It seems like docker-compose pull is supposedly getting SOMETHING for participation, admin, and report, though I suppose I don't know if "done" means there's nothing to pull:

$ sudo DOCKER_HOST=unix:///run/podman/podman.sock docker-compose pull
WARNING: The GIT_HASH variable is not set. Defaulting to a blank string.
Pulling postgres             ... done
Pulling math                 ... done
Pulling client-participation ... done
Pulling client-admin         ... done
Pulling client-report        ... done
Pulling file-server          ... done
Pulling server               ... done
Pulling nginx-proxy          ... done
Pulling maildev              ... done

@willcohen
Copy link
Contributor Author

Issue submitted with nerdctl, since it appears specific to that implementation of compose: containerd/nerdctl#434

Fully qualify domain names in Dockerfiles and docker-compose. Move the
build stages for client-report, client-participation, and client-admin
into the file-server Dockerfile, to avoid issues with non-Docker build
daemons being unable to access local image stores. This may be an issue
with coming Docker versions as well.

This commit does not fully work, however, with issues still pending
about serving the newly built files from the right location.
@willcohen
Copy link
Contributor Author

@metasoarous pushed with a partial fix. Per the issue submitted with limactl, it looks like the FROM compdem/polis-client-admin:${TAG} as admin is partially unreliable, and may even break in Docker with an upcoming version when it switches build clients.

A not-totally-ideal fix for this is to move the build steps for client-admin, client-participation, and client-report into the file-server Dockerfile, so there's no misalignment between various local images anymore. One slightly silver lining is that the three stages can build concurrently now, for whatever that's worth.

I updated the branch to fully qualify all domain names and modify file-server's Dockerfile. My unfamiliarity with the exact Gulp setup etc involved in each of these steps means that I did break something right now, so the branch isn't a perfect solution just yet. All the containers build, and based on the intermediate output I know that each step is generally completing correctly, but I'm doing something wrong with the final copy of the artifacts, since localhost:5000 can't find index_admin.html anymore. Any advice on where my misstep is? That issue aside I have a hunch that this overall approach will get things fully working on Podman and Lima/Rancher.

For now the branch still includes the three existing Dockerfiles in client-admin, client-participation, and client-report, since they may still be useful on their own. if this approach is acceptable, then they should either be removed or a note should be placed in them (as well as in file-server's), to ensure that changes in one should be made in the others for consistency's sake.

@willcohen
Copy link
Contributor Author

@ballPointPenguin @patcon I am also noticing that similar issues around changing the Dockerfile contexts came up here with another docker PR: #553 (comment)

# # Gulp v3 stops us from upgrading beyond Node v11
FROM docker.io/node:11.15.0-alpine

WORKDIR ../../client-participation/app
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be just /client-participation/app. Or even /app, since you end up getting a new image for each call to FROM IIUC.

@metasoarous
Copy link
Member

Hey @willcohen. Thanks again for pushing this forward!

... may even break in Docker with an upcoming version when it switches build clients.

I don't think this is the case, based on later responses from the thread you linked to, but I (and the comment author there) could be mistaken.

Regardless, I'm feeling a bit mixed about merging all of the client build processes this way. We've talked about moving all of the client code into a single subdirectory, with a unified build process, so there's maybe a case to be made that this helps move us in that direction. This may also help me with the task I'm currently working on (unifying the static asset deployment process with the rest of our heroku deployment infrastructure, but there are potentially some other directions I can go there). I suppose a good dose of my objection is simply aesthetic, because I don't see any concrete technical problems with this approach. I may have to sleep on it a bit.

My unfamiliarity with the exact Gulp setup etc involved in each of these steps means that I did break something right now...

FWIW, gulp is only involved in the participation client build (or should be at this point). We need to get it off that, as it's holding us up from upgrading, but that's another topic... I stand corrected; It looks like we are still using gulp for the admin-client. I thought we had ripped that out already.

All the containers build, and based on the intermediate output I know that each step is generally completing correctly, but I'm doing something wrong with the final copy of the artifacts, since localhost:5000 can't find index_admin.html anymore

That's a bit surprising. Looks like the build steps are the same. Did you check to see if other targets are being properly created there? I was initially a bit suspicious about the COPY --from... paths at the end of the Dockerfile, but they look right based on what you modified the WORKDIR clauses to.

Please let me know if you're able to figure out what's going on here, and if not I'll try to take a look later this week.

Thanks again!

@patcon
Copy link
Contributor

patcon commented Oct 21, 2021

Really appreciate all this work @willcohen! 🎉 I'll try to review over the weekend for a third opinion, if helpful.

Disclaimer: I still need to really digest all that's written here in issue. If I'm understanding Chris correctly, I'm also feeling a little remiss about moving the build context up a level. This means that all the files are sent into each container, which feels tangled. I recall there was loose consensus that this felt like a "code smell" when we had a couple people in the convo pooling brainpower on docker conventions.

But out of respect for all your work, I'm trying not to have a strong opinion :) Just hoping to understand if there's another way. As I said, I'll re-read the thread later.

Potential alternative: You mention that this came about because podman now handles docker-compose files rather than requiring kubernetes. Would it be helpful to instead use kubectl kompose to auto-generate kubernetes config from the docker-compose config as-is? We could do it and commit it as an artefact in the repo, and even lint to ensure it's always in sync (e.g., run generation command in CI, and check if git staging area is dirty/changed). If this doesn't sound right, no need to expend your time explaining specifics, but I'm just trying to naively provoke a thought on other approaches :)

EDIT: Further thought, as things come back to me. This would seem to make for a proliferation of docker images, since any change regenerates the whole cascade a multi-GB images, since they're now cache misses at each layer. Also, less important, but for those using docker to develop: I think using this single container breaks the way incremental building is possible when one just wants to rebuild one container that's being worked on, to not have the rebuild the whole thing. This is mostly what makes development in docker feel tolerable, since it's pretty quick when the other containers are cached, and layers with npm install come first.

@metasoarous metasoarous changed the base branch from dev to podman-compatibility November 4, 2021 19:27
@metasoarous
Copy link
Member

Thanks for sharing your thoughts @patcon.

This means that all the files are sent into each container, which feels tangled.

I think it only means that all of the files will be pushed in for the client build process, which doesn't seem as bad. This could also be gotten around by putting all of the clients under a single clients directory, so there's one context for all the clients.

This would seem to make for a proliferation of docker images, since any change regenerates the whole cascade a multi-GB images, since they're now cache misses at each layer.

Is this still true if we remove the individual images, and just have one single ClientsDockerfile? Seems like the same number of steps (which we can try to pair down a bit with a bit of cleanup).

I think using this single container breaks the way incremental building is possible when one just wants to rebuild one container that's being worked on, to not have the rebuild the whole thing.

@colinmegill has been working on a server branch with live code reloading, and there's already something like this for the Clojure/math component (a live REPL connection you can use for re-evaluating code, so a little different, but similar outcome). Seems like the right way to go to improve developer flow for the clients is to just add live code reloading for them as well, since this solves the problem.

We've talked about moving all of the client code into a single subdirectory, with a unified build process, so there's maybe a case to be made that this helps move us in that direction. -- me

To me this is really the crux of the issue. Is it our goal to eventually consolidate the individual clients into single build system with multiple targets? This would remove a lot of boilerplate and duplication, but is also a bit of work (as mentioned above) because of how old the tech in the participation client is. What are your thoughts on this @colinmegill?

In conclusion...

Right now this PR is potentially useful to us, as I'm trying to consolidate the client build and deploy infrastructure in a way that allows Heroku run these tasks, so that we have a consistent/comprehensive deploy workflow for our production instance. This appears to only be possible if we coalesce all of the client build steps in a single Dockerfile (IIUC, you can only have a single release image/phase on Heroku, that gets run after build and before finalizing the release; utility images that get created and then used by the release image does not seem to be supported, unfortunately).

To test out if this works, I'm going to merge into a dedicated branch, and from there into the heroku-deploy-static-assets branch to test if we can use this to move our entire deploy process to Heroku. If this works, and solves podman development challenges, then I think it's worth moving in this direction if there aren't significant concerns that arise.

Let's continue discussing here to preserve context; If we decide to merge into dev, we can start a new PR for that.

Thanks!

@metasoarous metasoarous merged commit ba4b289 into compdemocracy:podman-compatibility Nov 4, 2021
@willcohen
Copy link
Contributor Author

Apologies for the delay in the response. I think everyone's thoughts here accurately reflect the fact that all of the options seems to carry upsides and downsides. Let me know how best I can help next there.

Separately, apologies for the lack of bandwidth, I'm still in the rabbit hole of trying to get QEMU patched so podman works better on mac etc etc etc. Based on their release cycle, with any luck, that'll get wrapped up before the end of this year, QEMU will handle volumes better by the first major point release of 2022, podman will use that shortly after, and either way Podman vanilla should be working correctly with volume mounts on Mac from there, which ultimately provides one performant Mac dev workflow alternative to Docker Desktop. Some of the specific details about Dockerfile organization still unresolved here will simply be to help expedite the additional optionality of Lima/Rancher on Mac, too.

@willcohen
Copy link
Contributor Author

@metasoarous quick update here: with the 7.0 release of qemu, and soon-to-be backported to 6.2, podman will be able to mount volumes on mac, which will enable the dev workflow to work with docker compose.

@willcohen willcohen deleted the podman branch April 4, 2022 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants