Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional flag which advertises host for Arrow Flight SQL #418 #442

Merged
merged 1 commit into from
Oct 31, 2022

Conversation

DaltonModlin
Copy link
Contributor

@DaltonModlin DaltonModlin commented Oct 24, 2022

  • Update scheduler_config_spec.toml to include new flag 'advertise_host'
  • Add advertise_host member variable to SchedulerServer
  • Add advertise_host argument to new, new_with_policy, new_with_builder, and new_with_state in order to propagate flag
  • Add None argument to relevant method calls

Add optional flag which advertises host for Arrow Flight SQL #418

  • Update logic in job_to_fetch_part to use advertise-host flag when it exists
  • Remove default from advertise_host in scheduler_config_spec.toml
  • Wrap scheduler_server advertise_host variable in Option
  • Update scheduler's main.rs to reflect advertise_host being wrapped in Option

Utilize executor IP for routing to flight results in job_to_fetch_part even when advertise-host flag is set.

Add missing variable and ownership stuff

Remove unnecessary output from do_get_fallback

Which issue does this PR close?

Closes #418

Rationale for this change

What changes are included in this PR?

Adds member variable to SchedulerServer, adds argument to relevant constructors as necessary.

Are there any user-facing changes?

Adds advertise-host flag for proxying via scheduler as described in original issue.

@DaltonModlin DaltonModlin marked this pull request as ready for review October 24, 2022 19:48
@DaltonModlin
Copy link
Contributor Author

DaltonModlin commented Oct 24, 2022

I've validated my changes have the expected outcome by writing a test in the flight portion of the arrow repo.

@Test
public void testBallistaFlag() throws Exception {
    String url = "jdbc:arrow-flight://127.0.0.0:50050";
    java.util.Properties props = new java.util.Properties();
    props.setProperty("useEncryption", "false");
    props.setProperty("user", "admin");
    props.setProperty("password", "password");
    Connection con = DriverManager.getConnection(url, props);
    java.sql.Statement stmt = con.createStatement();
    String sql = "create external table customer STORED AS CSV WITH HEADER ROW LOCATION 'arrow-datafusion/datafusion/core/tests/tpch-csv/customer.csv';\n";
    assertTrue(stmt.execute(sql));
}

Running this test in conjunction with a local scheduler (having passing it --advertise-host="127.0.0.2:50050") and a local executor shows the request for flight results is proxied via the Scheduler rather than going directly to the executor IP.

@avantgardnerio
Copy link
Contributor

flight results is proxied

I'm actually a little surprised that query returns a flight since it is DML.

@andygrove I think we'll want to merge this PR because we presently rely on a bug in the FlightSQL JDBC driver where Ballista only works because it always re-uses the connection to the scheduler. This PR should make it work even after that issue is fixed.

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say the unwraps and unsafe array indexing need resolution. The others are just nits.

Nice job! 🥳

.server
.advertise_host
.as_ref()
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to avoid unwrap()s as a rule, but especially in functions that already return Results...

.unwrap()
.split(":")
.collect();
(advertise_host_flag[0].to_string(), advertise_host_flag[1].parse().unwrap())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same... I think there is a safe way to index into the array: https://adventures.michaelfbryan.com/posts/daily/slice-patterns/

ballista/scheduler/scheduler_config_spec.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, accidentally marked it approve. I'd like to see the unwraps fixed.

@andygrove
Copy link
Member

Approach LGTM. Agree on the unwraps. I would like to test this out as well before approving. I'll try and get that to that today.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @DaltonModlin!

Some(endpoint) => {
let advertise_endpoint_vec: Vec<&str> = endpoint.split(":").collect();
match advertise_endpoint_vec.as_slice() {
[host_ip, port] => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@andygrove
Copy link
Member

@DaltonModlin There is a clippy failure .. I will merge once that is resolved

@DaltonModlin
Copy link
Contributor Author

It looks like clippy is worried because there are 8 argument to the function new_with_policy. It might be worth using a Builder pattern to construct the SchedulerServer rather than the current pattern of many slightly differing constructors.

@andygrove
Copy link
Member

@DaltonModlin Looks like there is conflict?

- Update scheduler_config_spec.toml to include new flag 'advertise_host'
- Add advertise_host member variable to SchedulerServer
- Add advertise_host argument to new, new_with_policy, new_with_builder, and new_with_state in order to propagate flag
- Add None argument to relevant method calls

Add optional flag which advertises host for Arrow Flight SQL apache#418

- Update logic in job_to_fetch_part to use advertise-host flag when it exists
- Remove default from advertise_host in scheduler_config_spec.toml
- Wrap scheduler_server advertise_host variable in Option
- Update scheduler's main.rs to reflect advertise_host being wrapped in Option

Utilize executor IP for routing to flight results in job_to_fetch_part even when advertise-host flag is set.

Add missing variable and ownership stuff

Remove unnecessary output from do_get_fallback

Responding to PR feedback
- Rename advertise-host to advertise-endpoint
- Replace unwrap calls with expect where possible
- Add missing error handling when parsing advertise-endpoint flag

PR feedback

Co-authored-by: Andy Grove <[email protected]>

PR Feedback
- Using slice rather than array indexing for parsing advertise-endpoint

PR Feedback
- Fix clippy issue

Fix pipeline failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optional flag which advertises host for Arrow Flight SQL
3 participants