Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a docker-compose file and dockerfiles #249

Merged
merged 4 commits into from
Sep 21, 2022

Conversation

avantgardnerio
Copy link
Contributor

Which issue does this PR close?

Closes #248 .

Rationale for this change

Described in the issue.

What changes are included in this PR?

  • Add dockerfiles
  • Add a docker-compose file

Are there any user-facing changes?

They should find dockerfiles and be happy.

@avantgardnerio
Copy link
Contributor Author

@andygrove for your review queue

@@ -5,7 +5,9 @@ dev/dist
dev/release
python
**/docs
**/target
target/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper way to declare a folder

**/CHANGELOG.md
**/tests
**/data
!target/release/ballista-scheduler
!target/release/ballista-executor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't exclude the executables if we want them to go into the docker image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change still needed now that the executables are build in Docker?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind ... after reviewing the rest of the PR I understand why we have this

@@ -96,3 +96,6 @@ venv/*

# apache release artifacts
dev/dist

# logs
logs/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running Ballista in any way seems to put logs here that aren't gitignored, which is a pain for developers to not accidentally get them into PRs.


EXPOSE 50050

CMD ./ballista-scheduler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of these dockerfiles versus the existing ones in dev/docker? These are for faster dev testing as opposed to building releases?

We need to start the web ui as well, which we have with the existing dockerfiles under dev/docker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of these dockerfiles

Oh, as stated in the issue, it wasn't discoverable due to

find . -name *.Dockerfile

Returning no results. I see now that it returns ./dev/docker/ballista-arm64.Dockerfile because it is capitalised according to the standard.

I'll amend this PR to use those instead, and capitalize them as most devops engineers would expect.

@avantgardnerio
Copy link
Contributor Author

For future reference, the command was

find . -name "*.dockerfile" | sed -rn 's/(.*)\.dockerfile/\1.dockerfile \1.Dockerfile/p' | xargs -r -I {} bash -c "git mv {}"

@@ -17,4 +17,4 @@
# specific language governing permissions and limitations
# under the License.

export BALLISTA_VERSION=$(awk -F'[ ="]+' '$1 == "version" { print $2 }' ballista/rust/core/Cargo.toml)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andygrove When I changed the format of the toml file before to not redundantly specify the version, and you said you thought it broke the build somehow, is this why?

I think if we let cargo metadata do the toml parsing, then any correct toml syntax will work. awk was just pattern matching.

.read_csv(path, options)
.await
.map_err(|e| {
DataFusionError::Context(format!("Can't read CSV: {}", path), Box::new(e))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say file location when we can't load it

@@ -0,0 +1,4 @@
# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practice to check this in, just like cargo.lock


# Generate data into the ./data directory if it does not already exist
FILE=./data/supplier.tbl
if test -f "$FILE"; then
echo "$FILE exists."
else
mkdir data 2>/dev/null
docker run -v `pwd`/data:/data -it --rm ballista-tpchgen:$BALLISTA_VERSION
docker run -v `pwd`/data:/data -it --rm ghcr.io/databloom-ai/tpch-docker:main -vf -s 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use off-the-shelf data generator to go faster. No need to re-invent this wheel.

@@ -19,7 +19,7 @@

if [ -z "${DOCKER_REPO}" ]; then
echo "DOCKER_REPO env var must be set"
exit -1
exit 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative error codes aren't actually valid?

@@ -19,9 +19,13 @@

set -e

docker build -t ballista-builder --build-arg EXT_UID="$(id -u)" -f dev/docker/ballista-builder.Dockerfile .
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a build container so that WSL & mac users are happy

@@ -19,9 +19,13 @@

set -e

docker build -t ballista-builder --build-arg EXT_UID="$(id -u)" -f dev/docker/ballista-builder.Dockerfile .

docker run -v $(pwd):/home/builder/workspace ballista-builder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build in the build container, but don't use ephemeral storage and wipe everything so we have faster build & test cycles, but CI is still pure.


docker run -v $(pwd):/home/builder/workspace ballista-builder

docker-compose build
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker-compose is our new build script

docker tag apache/arrow-ballista-executor "apache/arrow-ballista-executor:$BALLISTA_VERSION"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the BALLISTA_VERSION when tagging, not buiding.

@@ -17,4 +17,6 @@
# specific language governing permissions and limitations
# under the License.

export BALLISTA_VERSION=$(awk -F'[ ="]+' '$1 == "version" { print $2 }' ballista/rust/core/Cargo.toml)
cd ballista/rust/core/
export BALLISTA_VERSION=$(cargo pkgid | cut '-d@' -f2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest version of this (that is correct) I could find.

@@ -15,14 +15,21 @@
# specific language governing permissions and limitations
# under the License.

ARG VERSION
FROM apache/arrow-ballista:$VERSION
FROM ubuntu:22.04
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an OS that developers can install themselves for dev/prod parity.

COPY target/$RELEASE_FLAG/ballista-scheduler /root/ballista-scheduler
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy in binaries to separate code build from docker build.


RUN apt-get update && \
apt-get install -y git build-essential
apt-get -y install libssl-dev openssl zlib1g zlib1g-dev libpq-dev cmake protobuf-compiler netcat curl unzip \
nodejs npm && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want a newer node at some point, but I ran into a lot of trouble with every non-apt option.

@andygrove
Copy link
Member

You can exclude yarn.lock from the RAT checks by adding it to dev/release/rat_exclude_files.txt

@andygrove
Copy link
Member

andygrove commented Sep 20, 2022

I just tried this out on Ubuntu 20.04 after running docker system prune -a to remove any previous images and I ran into a couple issues that I was able to fix with the following patch:

diff --git a/dev/build-ballista-docker.sh b/dev/build-ballista-docker.sh
index d8af64b5..5efbb152 100755
--- a/dev/build-ballista-docker.sh
+++ b/dev/build-ballista-docker.sh
@@ -23,9 +23,9 @@ docker build -t ballista-builder --build-arg EXT_UID="$(id -u)" -f dev/docker/ba
 
 docker run -v $(pwd):/home/builder/workspace ballista-builder
 
-docker-compose build
+docker compose build
 
 . ./dev/build-set-env.sh
-docker tag apache/arrow-ballista-executor "apache/arrow-ballista-executor:$BALLISTA_VERSION"
-docker tag apache/arrow-ballista-scheduler "apache/arrow-ballista-scheduler:$BALLISTA_VERSION"
-docker tag apache/arrow-ballista-benchmarks "apache/arrow-ballista-benchmarks:$BALLISTA_VERSION"
+docker tag ballista-executor "apache/arrow-ballista-executor:$BALLISTA_VERSION"
+docker tag ballista-scheduler "apache/arrow-ballista-scheduler:$BALLISTA_VERSION"
+docker tag ballista-benchmarks "apache/arrow-ballista-benchmarks:$BALLISTA_VERSION"
diff --git a/docker-compose.yml b/docker-compose.yml
index cbfb400c..1865db69 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -48,7 +48,7 @@ services:
     depends_on:
       - ballista-scheduler
   ballista-client:
-    image: ballista-client
+    image: ballista-benchmarks
     build:
       dockerfile: dev/docker/ballista-benchmarks.Dockerfile
       context: .

This got me further - the images were built and tagged - but the integration test failed with:

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/run.sh": stat /run.sh: no such file or directory: unknown

@avantgardnerio
Copy link
Contributor Author

+docker compose build

I don't understand how this works for you. I tested in docker in Ubuntu 20.04:

root@6b1248d5c2bd:/# docker --help | grep compose

root@6b1248d5c2bd:/# cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
...

root@6b1248d5c2bd:/# docker compose

docker: 'compose' is not a docker command.
See 'docker --help'

root@6b1248d5c2bd:/# docker-compose
Define and run multi-container applications with Docker.

Usage:

The command is docker-compose unless I'm going crazy.

@avantgardnerio
Copy link
Contributor Author

bgardner@bg-tensorbook:~$ docker --version
Docker version 20.10.12, build 20.10.12-0ubuntu4

and in docker ubuntu 20.04:

root@46a85dc6e8a1:/# docker --version
Docker version 20.10.12, build 20.10.12-0ubuntu2~20.04.1

Brent Gardner added 2 commits September 20, 2022 11:24
@avantgardnerio
Copy link
Contributor Author

@andygrove I take it back. I think this would be good to merge as soon as you feel the issues you listed have been addressed. Unfortunately I do not know how to reproduce, therefor fix, the one you are seeing with docker-compose not working.

@avantgardnerio
Copy link
Contributor Author

Everything works now on a system with docker-compose including the integration-tests.sh

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @avantgardnerio! This looks great. I tested locally after removing all Ballista images again and it worked perfectly.

It would be good if @yahoNanJing and @thinkharderdev could also test this out in their environments.

@andygrove
Copy link
Member

@yahoNanJing @thinkharderdev ok if I merge this, or do you want to test first?

Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Thanks @avantgardnerio

@andygrove
Copy link
Member

Thanks for the review @thinkharderdev. I will go ahead and merge this now.

@andygrove andygrove merged commit d76a8eb into apache:master Sep 21, 2022
@avantgardnerio avantgardnerio deleted the bg_fix_docker branch September 22, 2022 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add docker build files
3 participants