Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor build process into single Docker build #196

Merged
merged 2 commits into from
Mar 19, 2020

Conversation

carstenkoester
Copy link
Contributor

Refactor "build_danm.sh" script, Dockerfile and scm/build/build.sh
such that:

  • The build process happens entirely inside a container. There
    is no longer a requirement to have "go" installed on the host
    machine at all.

  • There is no longer a dependency on the build process to have
    any particular directory structure or volumes mounted from the
    host.

  • There is a single "one-stop" build script. There is no longer
    a requirement to build a build container, then run that build
    container, then copy binaries into another (integration/)
    directory and start another container build.

In addition, this change also packages up the CNI binaries
(danm & fakeipam) as well as the 3rd party macvlan, into a
container that then runs as part of a DaemonSet, moving the
binaries into place.

Last (and admittedly unrelated, but at very little extra cost),
this change also builds macvlan and deploys it along with the
DANM CNI. The source was already part of the build container,
so there is no extra cost there, and the additional time to
compile (and size added to the build container) are minimal.

What type of PR is this?
design

What does this PR give to us:
A cleaner build process (especially for users, not developers) that:

  • Has no dependency on golang outside of the build container. (It seems that
    even before this change, the was actually no compiling or running of go binaries
    [other than the statically compiled plugins] happening on the host, but golang
    was being used as a kind of drop-in replacement for "git clone"). BUT the previous
    Dockerfile was mounting volumes with reference to $GOPATH, so was exposing
    a technically unnecessary dependency on golang on the host.

  • Has no dependency on host network or host volume. Can be run on pretty much
    any machine that has docker (or buildah) installed and is able to execute a shell script.
    All data remains contained within the git workspace, and there is no requirement to
    export files back from the container into the host filesystem.

  • SHOULD (admittedly, untested) work with buildah again. The previous build was broken
    for buildah, because commit 6870313 change the mount directory from /go to /usr/local/go
    but did so only for the docker build. This diff merges build_docker and build_buildah into
    the same script, thus lowering the risk that one of the two build platforms gets missed
    in future changes.

  • Is considerably easier to use. Rather than running a build script, copying three resulting binaries into
    three integration directories, and running three more "docker build" commands,
    this process merges all of the above into a single build step.

  • Easier deployment of the CNI binaries. They, too, get packaged into a container image,
    and that image gets deployed as a DaemonSet, along with a shell script that copies these
    images into /opt/cni/bin. This eliminates another step of manually moving files around.

    That DaemonSet approach would also allow for the kubectl and 00-danm.conf files to be deployed
    by the DaemonSet - I'm happy to look into that in a subsequent change.

  • An updated deployment-guide to reflect these changes.

Which issue(s) this PR fixes (in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
Will ping you in Slack channel too.

While functional, there's a few things to consider still:

  • The code, as-is, while not having changed any golang code, will likely break the unit
    tests (or require the test to pull the binaries out of the container images and place them
    into their golang directory for unit testing). I have a few thoughts on how to address
    this, but might need some discussion about how the current go developer pattern work pattern
    is, to see which approach works best.

  • Also, not sure if this breaks your CI integration in any way?

  • Finally, this will conflict with PR#190. Happy to look at how the two can be merged, after
    initial review and commenting are done :-)

Does this PR introduce a user-facing change?:

Build and deployment process changed. See updated deployment-guide.md.


- fakeipam
- `netwatcher`: This image will be used by the `netwatcher` DaemonSet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we did in our product is that we ultimately created a "hyperdanm" container image, with all three binaries
I think it would be the easiest way to go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do, but how strong do you feel about that? It would reduce the size of the Dockerfile. it would marginally reduce the total consumed space for container images, but I have to admit I like the idea of having three separate images for the separate pods. Keeps things tidy somehow; each pod contains only the binary that it actually needs.

For the user, it should not make a difference anyway -- their kubectl apply look all the same :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about supporting both and avoid tying the users' hands?

Copy link
Contributor Author

@carstenkoester carstenkoester Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to build both, and let the user chose which to install?

Or to offer two versions of the build script (or one script with conditionals), and two paths of documentation?

Is this something that could potentially be done in a future enhancement?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest building both the single binary per image and the multi binary per image method, because as @Levovar mentioned in our product we prefer the multi binary image method, butI see other users (including You also?🌝) would prefer single binary images.

Is this something that could potentially be done in a future enhancement?

Yes, of course. If you are not interested in the multi binary image method, just leave it to me, I will add it later after your PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to look at it over the next few days - but if you guys can live with the current version, I'd probably prefer to merge this as-is, and then work on it as an incremental change that adds the option to build a second form factor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah just an idea, as you can see I feel so strongly about it I also haven't brought it over to upstream in the last 6 months :)

deployment-guide.md Outdated Show resolved Hide resolved
scm/build/build.sh Outdated Show resolved Hide resolved
@Levovar
Copy link
Collaborator

Levovar commented Mar 17, 2020

That DaemonSet approach would also allow for the kubectl and 00-danm.conf files to be deployed
by the DaemonSet - I'm happy to look into that in a subsequent change.

would very much appreciate it. that would effectively close our longest outstanding issue #35

The code, as-is, while not having changed any golang code, will likely break the unit
tests (or require the test to pull the binaries out of the container images and place them
into their golang directory for unit testing). I have a few thoughts on how to address
this, but might need some discussion about how the current go developer pattern work pattern
is, to see which approach works best.

have you tried it or just an assumption? I'm not sure UTs depend on precompiled binaries in the workspace. IMO go test simply compiles and executes its own test bin from source
at least I frequently get compile errors for the production code when I execute newly written TCs and mess-up some APIs

Also, not sure if this breaks your CI integration in any way?

Yes :)
https://travis-ci.org/github/nokia/danm/builds/663336486
but only the UTs, and only because of the missing GOPATH
if we merge the modularization first that might solve it in itself


RUN apk add --no-cache --virtual .tools curl libcap iputils \
&& adduser -u 147 -D -H -s /sbin/nologin danm \
&& chown root:danm /usr/local/bin/netwatcher \
Copy link
Collaborator

@eMGabriel eMGabriel Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest doing chown in COPY like COPY --from=builder --chown root:danm ...
Also you have to relocate RUN adduser ... danm before COPY then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite honestly I pretty much just moved what was in the previous component Dockerfiles (eg. integration/dockerfiles/netwatcher/Dockerfile in this example) and used it 1:1. But I do agree that this might be an opportunity to review these tasks. Will modify as you suggested...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cleaned up the Docker stage builds a bit... probably best to re-review the entire file as a whole :)

In short,

  • moved all the chown/chmod into the build stage;
  • created an intermitted stage which is just the Alpine image and adds the user (and then acts as a parent stage for netwatcher, svcwatcher, webhook)
  • removed arping,
  • removed a whole bunch of apk-add/apk-del that seemed to no longer serve any purpose

RUN apk add --no-cache --virtual .tools curl libcap iputils \
&& adduser -u 147 -D -H -s /sbin/nologin danm \
&& chown root:danm /usr/local/bin/netwatcher \
&& chmod 750 /usr/local/bin/netwatcher \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest doing chmod in builder container for all binaries, then will be no need to do it in every component stage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

&& chown root:danm /usr/local/bin/netwatcher \
&& chmod 750 /usr/local/bin/netwatcher \
&& setcap cap_sys_ptrace,cap_sys_admin,cap_net_admin=eip /usr/local/bin/netwatcher \
&& setcap cap_net_raw=eip /usr/sbin/arping \
Copy link
Collaborator

@eMGabriel eMGabriel Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was long ago when danm was relying on arping binary, it uses a go implementation of arping. Is there any other component that depends on it? Can it be removed?
How about doing setcap for needed binaries in builder container? Would it be set for the copied file?

Copy link
Contributor Author

@carstenkoester carstenkoester Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove the arping.

Unfortunately, no, xattrs (including what's set by setcap) are not preserved in a multistage build, it seems

Quick test:

% cat Dockerfile
FROM alpine:latest as stage1
RUN apk add --no-cache libcap attr \
 && touch test && echo "S1 BEFORE SET:" && getcap -v test && attr -l test \
 && setcap cap_net_admin=eip test  && echo "S1 AFTER SET:" && getcap -v test && attr -l test

FROM alpine:latest as stage2
COPY --from=stage1 test test
RUN apk add --no-cache libcap attr \
 && echo "STAGE2:" && getcap -v test && attr -l test

output (abbreviated):

S1 BEFORE SET:
test
S1 AFTER SET:
test = cap_net_admin+eip
Attribute "capability" has a 20 byte value for test

STAGE2:
test

Also see moby/moby#35699 . Pity though, would have been nice and clean to set this during the build stage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear! That means my spider-sense was warning me in the past. I did not know I was coming accross this issue while testing of PR #190 Migrate to go modules on a live system. It a pity. Thanks for investigating it. 👍


COPY --from=builder /go/bin/danm /cni/danm
COPY --from=builder /go/bin/fakeipam /cni/fakeipam
COPY --from=builder /go/bin/macvlan /cni/macvlan
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be collapsed in one line like this:
COPY --from=builder --chown=user:group /go/bin/danm /go/bin/fakeipam /go/bin/macvlan /cni/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

glide install --strip-vendor
go get -d github.com/vishvananda/netlink
go get github.com/containernetworking/plugins/pkg/ns
go get github.com/golang/groupcache/lru
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not needed after #190
because glide removed and dependencies added to go.mod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think that generally there'll be some overlap with #190 , and I actally wonder if it makes sense to merge #190 first and then I'll update this change based on the current situation then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were in your shoes, I would do the following:

  1. master-master sync between your fork and nokia/danm
  2. rebase on your synced fork master and step by step resolve every commit conflict
  3. cleanup, remove not needed content
  4. git commit --amend this removes not needed content/files from PR
  5. then git push --force because of rewriting commit in step 4. by --amend ()

For a healthy git history ✌️
But maybe you already know what to do. Trust your instincts!😄

Copy link
Contributor Author

@carstenkoester carstenkoester Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. I guess the one thing I hadn't seen, was that #190 did get merged one hour before my comment, and I somehow assumed it was still under review. With it being merged now, I'll go ahead and update this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.. and collapsed build-deps/build into a single stage again

@carstenkoester carstenkoester force-pushed the refactor-container-build branch 2 times, most recently from d01792f to 8b06712 Compare March 18, 2020 03:31
Refactor "build_danm.sh" script, Dockerfile and scm/build/build.sh
such that:

 - The build process happens entirely inside a container. There
   is no longer a requirement to have "go" installed on the host
   machine at all.

 - There is no longer a dependency on the build process to have
   any particular directory structure or volumes mounted from the
   host.

 - There is a single "one-stop" build script. There is no longer
   a requirement to build a build container, then run that build
   container, then copy binaries into another (integration/)
   directory and start another container build.

In addition, this change also packages up the CNI binaries
(danm & fakeipam) into a container that then runs as part of
a DaemonSet, moving the binaries into place on the host.
@carstenkoester carstenkoester force-pushed the refactor-container-build branch from 8b06712 to cb74e2a Compare March 18, 2020 04:16
@carstenkoester
Copy link
Contributor Author

With the go upgrade and change to go modules, no longer including macvlan in the image here after all. Now, it would take extra downloads and extra effort, so decided to keep that separate.

Mostly cosmetic cleanup to the deployment guide. Line lengths
in the MarkDown source, consistent indentation, blank lines for
readability, that sort of thing.
@Levovar Levovar merged commit 1b25b4a into nokia:master Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants