Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17407: [Doc][FlightRPC] Flight/gRPC best practices #13873

Merged
merged 15 commits into from
Sep 15, 2022

Conversation

rok
Copy link
Member

@rok rok commented Aug 14, 2022

We want to provide best practices and debugging section in:

cpp flight docs
python flight docs
java flight docs
R flight docs

@github-actions
Copy link

@zeroshade
Copy link
Member

@rok Can we also add best practices and debugging sections in the Go docs too?

This can be done by either a README.md in the go/arrow/flight directory or by having a "package comment" in the directory (a comment directly above the package flight line in one of the .go files (convention is usually a file named doc.go).

Would you prefer for me to add it to this branch?

@lidavidm lidavidm self-requested a review August 15, 2022 11:33
@rok
Copy link
Member Author

rok commented Aug 15, 2022

@rok Can we also add best practices and debugging sections in the Go docs too?

Thanks for chiming in @zeroshade! I was going to ask you where to put the Go specific content :).
I'll add it and ping you back.

@zeroshade
Copy link
Member

@rok Sounds great! Have a look at go/arrow/doc.go or go/parquet/doc.go for some examples.

https://tip.golang.org/doc/comment also gives a good rundown of how the format of the comment gets translated into the formatting of the rendered docs on pkg.go.dev (TL;DR it's similar to just using markdown)

@rok
Copy link
Member Author

rok commented Aug 15, 2022

@zeroshade I've added the Go doc. Please check if it makes sense, it's my first time touching Go docs.

@rok
Copy link
Member Author

rok commented Aug 16, 2022

@lidavidm I'm now linking to c++ from python/java/R docs which is not ideal, but duplicating text is not either. Any thoughts?

@rok rok marked this pull request as ready for review August 16, 2022 00:02
@drin
Copy link
Contributor

drin commented Aug 16, 2022

I'm now linking to c++ from python/java/R docs which is not ideal, but duplicating text is not either. Any thoughts?

IMO, I would think there should be a common area for the text since that is common across languages. Then, the c++ docs can include code snippets as appropriate which don't seem too relevant to the other languages.

The "Arrow Flight RPC" section under "Specifications and Protocols" seems like a good place to me, since those descriptions seem to be more related to best practices of using the protocol overall.

@drin
Copy link
Contributor

drin commented Aug 16, 2022

The code snippet could potentially be moved to a cookbook type thing too?

@lidavidm
Copy link
Member

What's wrong with linking to the C++ docs?

The format/ page for Flight RPC is meant more for things all implementations agree on, I don't think these tips really go there

@lidavidm
Copy link
Member

Also, linking to the C++ docs isn't applicable in all situations, they don't apply to Java for instance.

@lidavidm
Copy link
Member

The same goes for Go.

If we want to centralize things, we should think about where to do that. And then all the language docs can link there. (I think Sphinx has a 'tabs' extension we could use for the language-specific snippets too.)

@drin
Copy link
Contributor

drin commented Aug 16, 2022

What's wrong with linking to the C++ docs?

I don't think there's anything wrong with it. I was just thinking through alternatives since Rok didn't seem to like it.

Also, linking to the C++ docs isn't applicable in all situations, they don't apply to Java for instance.

Maybe I misread that portion of the PR, but I think it references it at the moment? I don't know the details, so not sure if some of the descriptions need disentangling

@lidavidm
Copy link
Member

The Java docs link to the C++ docs - but the C++ docs don't apply to Java. The bit about allocations doesn't apply to Java at all, for instance.

@drin
Copy link
Contributor

drin commented Aug 16, 2022

options that make the most sense to me are:

  1. make a new page, perhaps under the "development" section?
  2. leave the structure as is, and let a PR for ARROW-17378 and/or ARROW-6390 re-structure?

@lidavidm
Copy link
Member

Maybe we should consider a separate tutorial section then? Even if they're language-specific? (Well, 'tutorials and guides' or something…) The Flight tutorials could probably be made to work with the same 'tab' structure so they can cover multiple languages at once (though that should be used sparingly)

@drin
Copy link
Contributor

drin commented Aug 16, 2022

Assuming this tab structure:
I like the idea of a tab structure in cases where code snippets are small (or whatever text will go in the tabs), which seems to apply to what I see in this documentation. I also think that for something like flight, the focus is more on the protocol and communication so it's a case where there is a lot of conceptual overlap so tabs may work well here.

Otherwise, I don't have strong opinions about "best practices" being in, or separate from, a tutorial section.

@lidavidm
Copy link
Member

I didn't get to this today, should hopefully get to look at it this week

Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for starting this @rok! I've added a few suggestions.

docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Show resolved Hide resolved
Comment on lines 183 to 184
When using default gRPC transport options can be passed to it via
:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less a best practice, more of a "howto" so I think this should go in the cookbook

docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
This is what you are seeing now.
bounded thread pool -> Reject connections / requests when under load, and have
clients retry with backoff. This also gives an opportunity to retry with a
different node. Not everyone gets serviced but quality of service stays consistent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add code to configure these modes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like setting GRPC_ARG_MIN_RECONNECT_BACKOFF_MS to 200ms for bounded and 0ms for unbounded? Perhaps another setting I'm not aware of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added GRPC_ARG_MIN_RECONNECT_BACKOFF_MS example.


1. A stale connection can be closed using
:member:`arrow::flight::FlightClientOptions::stop_token`. This requires recording the
stop token at connection establishment time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs code samples

(it would be great if all code samples linked to the cookbook)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could add samples for now and they would be moved out as cookbooks become available?
Also I don't see an issue if we duplicate atomic pieces of the cookbooks here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much rather they be cookbook samples so that they actually get run in CI. It's probably good to still have a short snippet here, but I worry about those kinds of things bit-rotting without them actually getting exercised like the Cookbook does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can maybe merge this then add issues for moving each of these examples to the cookbook.

docs/source/cpp/flight.rst Show resolved Hide resolved
go/arrow/flight/doc.go Outdated Show resolved Hide resolved
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me on the Go side though some of the comments aren't accurate or relevant to the Go library, i've made some suggestions.

go/arrow/flight/doc.go Outdated Show resolved Hide resolved
Comment on lines 61 to 62
// https://issues.apache.org/jira/browse/ARROW-15764 proposes a caching
// optimisation for server side, but it was not yet implemented.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should mention that the caching optimization is specifically for C++ in that issue.

Comment on lines 69 to 73
// gRPC will spawn an unbounded number of threads for concurrent clients. Those
// threads are not necessarily cleaned up (cached thread pool in java parlance).
// glibc malloc clears some per thread state and the default tuning never clears
// caches in some workloads. But you can explicitly tell malloc to dump caches.
// See https://issues.apache.org/jira/browse/ARROW-16697 as an example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would actually be incorrect for the Go implementation since it uses goroutines which do not have a 1-to-1 relationship with OS threads. It wouldn't fit the "thread pool" description. We can probably remove this entire paragraph for the Go docs here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

go/arrow/flight/doc.go Outdated Show resolved Hide resolved
Comment on lines 87 to 75
// Closing unresponsive connections
//
// * A stale connection can be closed using CallOptions.stop_token. This requires
// recording the stop token at connection establishment time.
// * Use client timeout
// * here is a long standing ticket for a per-write/per-read timeout instead of a per
// call timeout (https://issues.apache.org/jira/browse/ARROW-6062), but this is not
// (easily) possible to implement with the blocking gRPC API. For now one can also do
// something like set up a background thread that calls cancel() on a timer and have
// the main thread reset the timer every time a write operation completes successfully
// (that means one needs to use to_batches() + write_batch and not write_table).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some comment here about the ability to use the context.Context that is passed to requests to implement a timeout via https://pkg.go.dev/context#WithTimeout or the ability to cancel via https://pkg.go.dev/context#WithCancel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added something, but I'm not sure I understand enough for it to make sense :).

Copy link
Member Author

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review everyone! I've gone through your comments and hopefully used them productively.

docs/source/cpp/flight.rst Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
This is what you are seeing now.
bounded thread pool -> Reject connections / requests when under load, and have
clients retry with backoff. This also gives an opportunity to retry with a
different node. Not everyone gets serviced but quality of service stays consistent.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like setting GRPC_ARG_MIN_RECONNECT_BACKOFF_MS to 200ms for bounded and 0ms for unbounded? Perhaps another setting I'm not aware of?


1. A stale connection can be closed using
:member:`arrow::flight::FlightClientOptions::stop_token`. This requires recording the
stop token at connection establishment time.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could add samples for now and they would be moved out as cookbooks become available?
Also I don't see an issue if we duplicate atomic pieces of the cookbooks here.

@rok
Copy link
Member Author

rok commented Sep 2, 2022

I've addressed the feedback (hopefully all) and opened a cookbook issue (apache/arrow-cookbook#251).
Would appreciate another look! :)

@github-actions
Copy link

github-actions bot commented Sep 2, 2022

Revision: f769248

Submitted crossbow builds: ursacomputing/crossbow @ actions-f012880c10

Task Status
preview-docs Github Actions

docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
gRPC's default behavior allows one server to accept many connections from many
different clients, but if requests do a lot of work (as they may under Flight),
the server may not be able to keep up. Configuring the server to limit the number
of clients and reject requests more proactively, and configuring clients to retry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you configure the server to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(IIRC, it's not possible without modifying Flight…)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is referring to gRPC settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say either remove this text or have a code sample of how to set the server-side limit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
--------------------------------

1. A stale connection can be closed using
:member:`arrow::flight::FlightClientOptions::stop_token`. This requires recording the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is FlightCallOptions not ClientOptions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already FlightCallOptions just the diff is off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see ClientOptions here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw it here now too. Fixed.

@rok
Copy link
Member Author

rok commented Sep 9, 2022

Thanks for the review @lidavidm ! I think I addressed everything and sorry for not being faster.

@rok
Copy link
Member Author

rok commented Sep 9, 2022

@github-actions crossbow submit preview-docs

@rok rok closed this Sep 9, 2022
@rok rok reopened this Sep 9, 2022
@github-actions
Copy link

github-actions bot commented Sep 9, 2022

Revision: 0fba44b

Submitted crossbow builds: ursacomputing/crossbow @ actions-20a398e242

Task Status
preview-docs Github Actions

@rok
Copy link
Member Author

rok commented Sep 13, 2022

@lidavidm
Copy link
Member

sorry, I'm fairly backed up, I'll try to get to this sometime this week

@rok
Copy link
Member Author

rok commented Sep 13, 2022

No hurry on my end @lidavidm! Sorry for the slow iterations.

docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
gRPC's default behavior allows one server to accept many connections from many
different clients, but if requests do a lot of work (as they may under Flight),
the server may not be able to keep up. Configuring the server to limit the number
of clients and reject requests more proactively, and configuring clients to retry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say either remove this text or have a code sample of how to set the server-side limit

--------------------------------

1. A stale connection can be closed using
:member:`arrow::flight::FlightClientOptions::stop_token`. This requires recording the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see ClientOptions here

docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
docs/source/cpp/flight.rst Outdated Show resolved Hide resolved
@rok
Copy link
Member Author

rok commented Sep 15, 2022

@lidavidm thanks for the review! I've merged you change proposals, let me know if I missed something.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lidavidm
Copy link
Member

I think in terms of followups:

  • Can we configure a server to shed load more proactively?
  • Porting examples to the cookbook
  • Providing a recipe for a per-read timeout

@rok
Copy link
Member Author

rok commented Sep 15, 2022

@lidavidm shall I open cookbook github issues for those?

@lidavidm
Copy link
Member

I think you did already but if not, yeah

@rok
Copy link
Member Author

rok commented Sep 15, 2022

Indeed: apache/arrow-cookbook#251

@rok rok merged commit 2e72e0a into apache:master Sep 15, 2022
@rok rok deleted the ARROW-17407 branch September 15, 2022 23:01
@ursabot
Copy link

ursabot commented Sep 16, 2022

Benchmark runs are scheduled for baseline = e63a13a and contender = 2e72e0a. 2e72e0a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.17% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.28% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.39% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 2e72e0a8 ec2-t3-xlarge-us-east-2
[Finished] 2e72e0a8 test-mac-arm
[Failed] 2e72e0a8 ursa-i9-9960x
[Finished] 2e72e0a8 ursa-thinkcentre-m75q
[Finished] e63a13aa ec2-t3-xlarge-us-east-2
[Failed] e63a13aa test-mac-arm
[Failed] e63a13aa ursa-i9-9960x
[Finished] e63a13aa ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

zagto pushed a commit to zagto/arrow that referenced this pull request Oct 7, 2022
We want to provide best practices and debugging section in:

[cpp flight docs](https://arrow.apache.org/docs/cpp/flight.html)
[python flight docs](https://arrow.apache.org/docs/python/flight.html)
[java flight docs](https://arrow.apache.org/docs/java/flight.html)
[R flight docs](https://arrow.apache.org/docs/r/articles/flight.html)

Lead-authored-by: Rok <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Co-authored-by: David Li <[email protected]>
Signed-off-by: Rok Mihevc <[email protected]>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Oct 17, 2022
We want to provide best practices and debugging section in:

[cpp flight docs](https://arrow.apache.org/docs/cpp/flight.html)
[python flight docs](https://arrow.apache.org/docs/python/flight.html)
[java flight docs](https://arrow.apache.org/docs/java/flight.html)
[R flight docs](https://arrow.apache.org/docs/r/articles/flight.html)

Lead-authored-by: Rok <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Co-authored-by: David Li <[email protected]>
Signed-off-by: Rok Mihevc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants