-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a docker-compose file and dockerfiles #249
Conversation
@andygrove for your review queue |
@@ -5,7 +5,9 @@ dev/dist | |||
dev/release | |||
python | |||
**/docs | |||
**/target | |||
target/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper way to declare a folder
**/CHANGELOG.md | ||
**/tests | ||
**/data | ||
!target/release/ballista-scheduler | ||
!target/release/ballista-executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't exclude the executables if we want them to go into the docker image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change still needed now that the executables are build in Docker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind ... after reviewing the rest of the PR I understand why we have this
@@ -96,3 +96,6 @@ venv/* | |||
|
|||
# apache release artifacts | |||
dev/dist | |||
|
|||
# logs | |||
logs/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running Ballista in any way seems to put logs here that aren't gitignored, which is a pain for developers to not accidentally get them into PRs.
scheduler.Dockerfile
Outdated
|
||
EXPOSE 50050 | ||
|
||
CMD ./ballista-scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the goal of these dockerfiles versus the existing ones in dev/docker
? These are for faster dev testing as opposed to building releases?
We need to start the web ui as well, which we have with the existing dockerfiles under dev/docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the goal of these dockerfiles
Oh, as stated in the issue, it wasn't discoverable due to
find . -name *.Dockerfile
Returning no results. I see now that it returns ./dev/docker/ballista-arm64.Dockerfile
because it is capitalised according to the standard.
I'll amend this PR to use those instead, and capitalize them as most devops engineers would expect.
For future reference, the command was
|
@@ -17,4 +17,4 @@ | |||
# specific language governing permissions and limitations | |||
# under the License. | |||
|
|||
export BALLISTA_VERSION=$(awk -F'[ ="]+' '$1 == "version" { print $2 }' ballista/rust/core/Cargo.toml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove When I changed the format of the toml
file before to not redundantly specify the version, and you said you thought it broke the build somehow, is this why?
I think if we let cargo metadata
do the toml parsing, then any correct toml syntax will work. awk
was just pattern matching.
.read_csv(path, options) | ||
.await | ||
.map_err(|e| { | ||
DataFusionError::Context(format!("Can't read CSV: {}", path), Box::new(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say file location when we can't load it
@@ -0,0 +1,4 @@ | |||
# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practice to check this in, just like cargo.lock
|
||
# Generate data into the ./data directory if it does not already exist | ||
FILE=./data/supplier.tbl | ||
if test -f "$FILE"; then | ||
echo "$FILE exists." | ||
else | ||
mkdir data 2>/dev/null | ||
docker run -v `pwd`/data:/data -it --rm ballista-tpchgen:$BALLISTA_VERSION | ||
docker run -v `pwd`/data:/data -it --rm ghcr.io/databloom-ai/tpch-docker:main -vf -s 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use off-the-shelf data generator to go faster. No need to re-invent this wheel.
@@ -19,7 +19,7 @@ | |||
|
|||
if [ -z "${DOCKER_REPO}" ]; then | |||
echo "DOCKER_REPO env var must be set" | |||
exit -1 | |||
exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative error codes aren't actually valid?
@@ -19,9 +19,13 @@ | |||
|
|||
set -e | |||
|
|||
docker build -t ballista-builder --build-arg EXT_UID="$(id -u)" -f dev/docker/ballista-builder.Dockerfile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a build container so that WSL & mac users are happy
@@ -19,9 +19,13 @@ | |||
|
|||
set -e | |||
|
|||
docker build -t ballista-builder --build-arg EXT_UID="$(id -u)" -f dev/docker/ballista-builder.Dockerfile . | |||
|
|||
docker run -v $(pwd):/home/builder/workspace ballista-builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build in the build container, but don't use ephemeral storage and wipe everything so we have faster build & test cycles, but CI is still pure.
|
||
docker run -v $(pwd):/home/builder/workspace ballista-builder | ||
|
||
docker-compose build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker-compose is our new build script
dev/build-ballista-docker.sh
Outdated
docker tag apache/arrow-ballista-executor "apache/arrow-ballista-executor:$BALLISTA_VERSION" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need the BALLISTA_VERSION when tagging, not buiding.
@@ -17,4 +17,6 @@ | |||
# specific language governing permissions and limitations | |||
# under the License. | |||
|
|||
export BALLISTA_VERSION=$(awk -F'[ ="]+' '$1 == "version" { print $2 }' ballista/rust/core/Cargo.toml) | |||
cd ballista/rust/core/ | |||
export BALLISTA_VERSION=$(cargo pkgid | cut '-d@' -f2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest version of this (that is correct) I could find.
@@ -15,14 +15,21 @@ | |||
# specific language governing permissions and limitations | |||
# under the License. | |||
|
|||
ARG VERSION | |||
FROM apache/arrow-ballista:$VERSION | |||
FROM ubuntu:22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an OS that developers can install themselves for dev/prod parity.
COPY target/$RELEASE_FLAG/ballista-scheduler /root/ballista-scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy in binaries to separate code build from docker build.
|
||
RUN apt-get update && \ | ||
apt-get install -y git build-essential | ||
apt-get -y install libssl-dev openssl zlib1g zlib1g-dev libpq-dev cmake protobuf-compiler netcat curl unzip \ | ||
nodejs npm && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want a newer node at some point, but I ran into a lot of trouble with every non-apt option.
You can exclude |
I just tried this out on Ubuntu 20.04 after running
This got me further - the images were built and tagged - but the integration test failed with:
|
I don't understand how this works for you. I tested in docker in Ubuntu 20.04:
The command is |
and in docker ubuntu 20.04:
|
e45183e
to
8eb30be
Compare
@andygrove I take it back. I think this would be good to merge as soon as you feel the issues you listed have been addressed. Unfortunately I do not know how to reproduce, therefor fix, the one you are seeing with |
Everything works now on a system with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @avantgardnerio! This looks great. I tested locally after removing all Ballista images again and it worked perfectly.
It would be good if @yahoNanJing and @thinkharderdev could also test this out in their environments.
@yahoNanJing @thinkharderdev ok if I merge this, or do you want to test first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! Thanks @avantgardnerio
Thanks for the review @thinkharderdev. I will go ahead and merge this now. |
Which issue does this PR close?
Closes #248 .
Rationale for this change
Described in the issue.
What changes are included in this PR?
Are there any user-facing changes?
They should find dockerfiles and be happy.