-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downgrade docker-compose.yaml
to version 3.3 so that we can support Ubuntu 20.04.4 LTS
#329
Downgrade docker-compose.yaml
to version 3.3 so that we can support Ubuntu 20.04.4 LTS
#329
Conversation
@@ -54,8 +54,7 @@ services: | |||
volumes: | |||
- ./benchmarks/data:/data | |||
depends_on: | |||
ballista-scheduler: | |||
condition: service_healthy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to start executors before the scheduler runs, but I haven't tested this in a long time. If we require the scheduler to be running first, then we should file a bug for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this and the executor fails if it cannot connect to the scheduler. In theory, this should be fine in an environment such as k8s because the pod will just keep restarting until it succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is docker-compose, not k8s - so maybe we need to mess with restart policy? @iajoiner might have more to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restart config option could be added to the compose file. https://docs.docker.com/compose/compose-file/#restart
restart:always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A restart policy only takes effect after a container starts successfully. In this case, starting successfully means that the container is up for at least 10 seconds and Docker has started monitoring it. This prevents a container which does not start at all from going into a restart loop.
😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am tempted to add a
--retry-timeout--seconds 10
option to the executor, which would fix this. Ballista is targeted at enterprise users, and they are often not on the latest distro, so would be nice for this to work out of the box for as many people as possible.
I was thinking of the same option given that the application does not seem to have a hard constraint to be readily available within milliseconds on startup. I second the part with having it working out of the box for different users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove - I am not too familiar with the codebase yet, but I saw that if the scheduler is killed while the executor is running, the executor tries to reconnect forever. Once the scheduler is restarted, the executor is fine. Probably a silly question, but couldn't this behaviour be there from the start? Why does it first need to connect to an actual running instance of the scheduler, instead of keep trying the default/configured one until it succeeds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TaskRunnerPool
loops forever, polling schedulers to fetch tasks to run. This code opens a new connection to the scheduler each time, which is inefficient but does mean that it eventually recovers from scheduler downtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit to add the new config. This fixes it for me.
ballista-executor_1 | 2022-10-15T15:44:09.490225Z WARN main ThreadId(01) ballista_executor: Failed to connect to scheduler at http://ballista-scheduler:50050 (Could not connect to scheduler); retrying ...
ballista-executor_1 | 2022-10-15T15:44:09.991817Z INFO main ThreadId(01) ballista_executor: Connected to scheduler at http://ballista-scheduler:50050
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TaskRunnerPool
loops forever, polling schedulers to fetch tasks to run. This code opens a new connection to the scheduler each time, which is inefficient but does mean that it eventually recovers from scheduler downtime.
I see, I thought it might be the case. Thanks for taking the time to explain!
docker-compose.yaml
to version 3.3 so that we can support Ubuntu 20.04.4 LTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the reviews @avantgardnerio and @onthebridgetonowhere |
Which issue does this PR close?
N/A
Rationale for this change
I could not build Docker images on Ubuntu 20.04.4 LTS
What changes are included in this PR?
Specify version 3.3 in docker-compose.yaml
Are there any user-facing changes?