Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run example fails via PushStaged mode #214

Closed
r4ntix opened this issue Sep 15, 2022 · 2 comments · Fixed by #217
Closed

Run example fails via PushStaged mode #214

r4ntix opened this issue Sep 15, 2022 · 2 comments · Fixed by #217
Labels
bug Something isn't working

Comments

@r4ntix
Copy link
Contributor

r4ntix commented Sep 15, 2022

Describe the bug
I start the scheduler and executor service in localhost:

./target/debug/ballista-scheduler -s push-staged --log-level-setting INFO,ballista_scheduler=DEBUG

./target/debug/ballista-executor -c 4 -s push-staged --log-level-setting INFO,ballista_executor=DEBUG

And run examples/src/bin/sql.rs, the executor was an error:

2022-09-14T07:18:35.502374Z ERROR tokio-runtime-worker ThreadId(08) ballista_executor::executor_server: Fail to connect to scheduler scheduler_ballista_localhost_50050 due to TonicError(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: nodename nor servname provided, or not known" }))))

The executor can't connect to scheduler via scheduler_ballista_localhost_50050.

To Reproduce

The scheduler send scheduler_id in LaunchTaskParams to executor when launch task:
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/scheduler/src/state/task_manager.rs#L415-L430

The scheduler_id generate by scheduler when start service, and value is format!("scheduler_{}_{}_{}", namespace, external_host, port):
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/scheduler/src/main.rs#L171

In the process of the executor reporting the task status, call get_scheduler_client pass scheduler_id:
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/executor/src/executor_server.rs#L507-L519

The scheduler_id value is format!("scheduler_{}_{}_{}", namespace, external_host, port), that can't lookup address via dns:
https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/executor/src/executor_server.rs#L222-L237

So the executor throw an error, and task fail.

Describe the solution you'd like
Fix scheduler_name to format!("{}:{}", opt.external_host, opt.bind_port), default is localhost:50050.
The prefix name of the log file remains format!("scheduler_{}_{}_{}", namespace, external_host, port).

@r4ntix
Copy link
Contributor Author

r4ntix commented Sep 15, 2022

@yahoNanJing

@yahoNanJing
Copy link
Contributor

Thanks @r4ntix for finding out this issue. Could you help propose a PR for this fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants