Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grpc reflection functionality to grpc client #1987

Closed
wants to merge 28 commits into from

Conversation

joshcarp
Copy link
Contributor

@joshcarp joshcarp commented Apr 24, 2021

  • Add gRPC reflection via the grpc.Client().reflect function.
  • Use the reflection API to retrieve method descriptors from service
  • Factor out convertToMethodInfo to convert file descriptors into []MethodInfo and protoreflect.MethodDescriptor. This is used in both Load and Reflect
  • Add samples/grpc_reflection.js example of how to use this new function

Help needed:

  • I don't know how to test this properly, any advice would be appreciated

This PR closes: #1930

@joshcarp joshcarp changed the title add grpc reflection functionality to grpc client Add grpc reflection functionality to grpc client Apr 24, 2021
@joshcarp joshcarp marked this pull request as ready for review April 25, 2021 06:19
@na-- na-- added this to the v0.33.0 milestone Apr 26, 2021
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR and sorry for the delay in reviewing!

I am not sure if having a client.reflect() method in the init context is the right approach here. Generally, we try to avoid network calls in the init context... On the other hand, this is somewhat similar to client.load(), so I am not sure if we can't make an exception this time 🤷‍♂️

Other than this API design issue, I skimmed the rest of the code and it looks good. As I mentioned on slack, we've already closed the k6 v0.32.0 release window, so this can only be merged after we release it around May 10th. So we'll discuss the API design internally (or in the issue) and get back to you around then.

@codecov-commenter
Copy link

Codecov Report

Merging #1987 (9d9b896) into master (a9fd50a) will decrease coverage by 0.11%.
The diff coverage is 2.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1987      +/-   ##
==========================================
- Coverage   71.75%   71.63%   -0.12%     
==========================================
  Files         182      183       +1     
  Lines       14259    14286      +27     
==========================================
+ Hits        10231    10234       +3     
- Misses       3387     3410      +23     
- Partials      641      642       +1     
Flag Coverage Δ
ubuntu 71.57% <2.22%> (-0.13%) ⬇️
windows 71.26% <2.22%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
js/modules/k6/grpc/reflect.go 0.00% <0.00%> (ø)
js/modules/k6/grpc/client.go 69.58% <3.22%> (-8.07%) ⬇️
js/modules/k6/data/data.go 83.33% <0.00%> (-0.88%) ⬇️
core/engine.go 84.41% <0.00%> (-0.61%) ⬇️
cmd/outputs.go 0.00% <0.00%> (ø)
js/runner.go 81.76% <0.00%> (+0.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9fd50a...9d9b896. Read the comment docs.

@joshcarp
Copy link
Contributor Author

I am not sure if having a client.reflect() method in the init context is the right approach here. Generally, we try to avoid network calls in the init context...

Yeah I was trying to avoid that but the only other option I could come up with was to do it in the VU loop, but then that'll cause extra load on the API side because the reflection api will be getting called every loop.

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution @joshcarp! Are you still available to work on this PR? We'd like to try to get it merged for the upcoming v0.33.0 release (planned for end of June), but after taking a deeper look I'm not sure we'd have time to fix the leftover things here.

As for tests, we should probably add reflection support to the test server and test against that.

js/modules/k6/grpc/client.go Outdated Show resolved Hide resolved
js/modules/k6/grpc/client.go Outdated Show resolved Hide resolved
net.Dialer{
Timeout: 2 * time.Second,
KeepAlive: 10 * time.Second,
}, netext.NewResolver(net.LookupIP, 0, types.DNSfirst, types.DNSpreferIPv4)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhmm shouldn't all of this inherit from the global settings (state.Options)? The UserAgent for sure should be consistent with the one set for HTTP and eventually WS 😅, and the Dialer/TLS/DNS settings also seem out of place here, since these should be configurable and not static.

Hopefully this wouldn't be a big problem and we can fix it before #883.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's in the VU context and this has been removed

import { check } from "k6";

let client = new grpc.Client();
client.reflect("127.0.0.1:10000", { plaintext: true })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @na-- that having this in the init context is problematic. Keep in mind that the init context is evaluated at least 4 times (for getting the options, for setup(), teardown() and handleSummary() even if these aren't defined in the script), and then for every created VU. So it's wasteful to have the gRPC connection/reflection happen here multiple times, and it could impact VU initialization time.

To me it would make sense to do this in setup(), but since we can't easily share data from there it wouldn't be feasible for distributed execution. Maybe we should prioritize having a per-VU init function (#785), though even in that case it would be wasteful for each VU to do the reflection, and the service definitions (whether loaded statically or reflected) should be shared among VUs anyway. So it seems like we need some kind of SharedArray and per-VU init to make this happen, which is not trivial to add.

And from an API perspective, I think reflection should be an implementation detail, so also agree with @na-- that it would make sense for this to be an optional parameter to the Client constructor, instead of a separate method that scripts have to call.

Could it be added as an argument to client.connect() instead, in which case we could cache the definitions internally and avoid the init context problems? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client.load('local/path/to.proto') needed to be in the init context, simply because it if wasn't, then we couldn't bundle the .proto files for k6 cloud or k6 archive. But for reflection, I'm currently on the fence, but leaning very heavily towards requiring it to be in the VU context...

I don't have a strong opinion either way if that should be done as a parameter to the constructor that will make connect() get the service definitions on the first connection, or if it should be done somewhat like this:

let client = new grpc.Client();

export default function () {
    if (__ITER == 0) {
        console.log(`Initializing ${__VU}...`)
        client.connect("127.0.0.1:10000", { plaintext: true })
        client.getProtoDefinitions(); // or whatever, using the already established connection
    }
    // ...
}

In either case, the iteration_duration of that first iteration will be a bit skewed, but there's nothing we can do about that until we have #785 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work as well, but I'd prefer if we avoided the use of __ITER for what should be built-in functionality, so I'm for handling that internally in connect() if possible. There's no reason to have scripts manually call getProtoDefinitions() unless we plan to expose them in JS for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw since it seems like the constructor doesn't have any parameters currently, while connect() does, and since it seems like the very rough consensus is that this should happen during connect(), should the option be a connect() parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very versed in gRPC, but from what I can see:

  1. in order to have the reflection we need to connect to the server so it is only natural that it should happen with the connect method or after it
  2. if there is a way to detect that the server can give us the proto files I think k6 should just get the, And then have a way to tell it not to with an argument on the connect method. If it isn't possible I have no opinion on whether it should be an argument or a function call after the connect 🤷
  3. We can definitely cache across-VUs as well as per-vu with the same/similar method used in SharedArray or the xk6-segment extension ... I do consider this trivial as well ;) . Although probably there should be a way to tell if you don't want to 🤷 maybe in a separate PR. This should remove the whole __ITER==0 thing and we can probably use the server address as the key (from what I understand)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross-VU sharing of protobuf definitions is definitely something out of scope for this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it so that the client.reflect function is called at the end of the connect function only if {reflect: true} is passed in.

I have used a sync.Once to ensure that it is only called once per client

@joshcarp
Copy link
Contributor Author

Hey,
Yeah I'll be getting around to this in the next week or so

@na-- na-- modified the milestones: v0.33.0, v0.34.0 Jun 16, 2021
@joshcarp joshcarp force-pushed the feature/grpc-reflection branch from 21f4c03 to d3f19f8 Compare July 17, 2021 13:12
@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2021

Codecov Report

❗ No coverage uploaded for pull request base (master@a460e69). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 22c5ace differs from pull request most recent head f29f207. Consider uploading reports for the commit f29f207 to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1987   +/-   ##
=========================================
  Coverage          ?   71.98%           
=========================================
  Files             ?      178           
  Lines             ?    14309           
  Branches          ?        0           
=========================================
  Hits              ?    10301           
  Misses            ?     3376           
  Partials          ?      632           
Flag Coverage Δ
ubuntu 71.98% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a460e69...f29f207. Read the comment docs.

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joshcarp, thanks for picking this back up. The latest changes LGTM. We'd like to get this merged for the upcoming v0.34.0 release, slated for end of August.

A few points besides the minor comments:

  • The performance is surprisingly poor when testing against a local samples/grpc_server with samples/grpc_reflection.js:

    grpc_req_duration....: avg=538.05ms min=75.52ms med=440ms    max=959.33ms p(90)=907.96ms p(95)=933.64ms
    iteration_duration...: avg=549.21ms min=76.89ms med=474.45ms max=960.83ms p(90)=914.34ms p(95)=937.59ms
    

    It's not much better with samples/grpc.js, even without console.log()-ing the message. So the server might be unoptimized, but I'm wondering if you know the reason for this.

    I suppose it's related to how we connect and disconnect on each iteration. Ideally we could keep the connection open across iterations, but that might be a much bigger change. And this is the way WebSockets work now, so it's probably acceptable as is.

  • I see you registered the reflection on the httpmultibin server, but I don't see any tests for it. Do you think some could be added alongside the existing ones in client_test.go?

  • There are some linter errors that should be fixed, and we'll look into the xk6 test failures, which are probably related to the recent k6io -> grafana org change.

js/modules/k6/grpc/client.go Outdated Show resolved Hide resolved
js/modules/k6/grpc/reflect.go Outdated Show resolved Hide resolved
samples/grpc_server/main.go Outdated Show resolved Hide resolved
@joshcarp
Copy link
Contributor Author

Hey

I see you registered the reflection on the httpmultibin server, but I don't se any tests for it. Do you think some could be added alongside the existing ones in client_test.go?

Yeah, I started testing it although ran into some problems because the grpc test cases seem to all use the same state which means it's difficult to test both load and reflect

There are some linter errors that should be fixed, and we'll look into the xk6 test failures, which are probably related to the recent k6io -> grafana org change.

Apologies for that

The performance is surprisingly poor when testing against a local samples/grpc_server with samples/grpc_reflection.js:

investigating

@joshcarp
Copy link
Contributor Author

joshcarp commented Jul 30, 2021

Hey @imiric
I've added changes, one thing that was required was to refactor how the grpc tests work because the current method uses the same state between tests, so it was difficult to test both load and reflect in the same test harness. I've now put it in table tests that run in a completely fresh context for every test. I have tested:

  • Reflection on an api that doesn't have reflection enabled: ReflectUnregistered
  • Reflect on a server that does Reflect
  • Reflect with a non bool parameter ReflectBadParam
  • Reflect and invoke on a method that doesn't exist ReflectInvokeNoExist
  • Reflect and invoke on a method that does exist ReflectInvoke
    Lemme know if i missed anything

@joshcarp joshcarp force-pushed the feature/grpc-reflection branch from b58d35a to de0b076 Compare July 30, 2021 03:06
@joshcarp
Copy link
Contributor Author

Performance issues seem to be an issue in k6 already: #1846
I haven't dug too deep into it, I'll try to do some benchmarks between reflect and loading directly from source files

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick changes @joshcarp!

Nice work on the tests too. I'd prefer smaller tests for individual methods (TestClientLoad, TestClientConnect, etc.), but this is great as well.

This LGTM for merging, but let's wait for someone else from the k6 team to also review it, possibly sometime next week.

Performance issues seem to be an issue in k6 already: #1846

Yeah, I remember that issue, but the performance in this test is orders of magnitude worse. In any case, it's also that way on master with samples/grpc.js, so it's not related to these changes, but probably to the test server itself.

js/modules/k6/grpc/client_test.go Outdated Show resolved Hide resolved
js/modules/k6/grpc/client_test.go Outdated Show resolved Hide resolved
js/modules/k6/grpc/client_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, great job.

Please do not delete license files and rebase on top of master which will also fix the currently failing CI for you. (you can also squash everything, although we will do that either way in the end 🤷 )

As I mentioned in one of the comments I am not certain caching (or more accurately not re-reflecting on each connection) is a good idea, especially given all the things that this break currently. On the other hand the API currently isn't all that useful and as I mention I doubt we won't redo it as a whole at some point in the future so ... maybe have caching for now ... and hope people won't have problems with the fact it does break some workflows that reflection enables 🤷

@imiric the abysmal performance you are noticing is because the current grpc test server has a bunch of sleeps in it to be more "realistic" which probably should be dropped. My numbers below are from after I've dropped them.

Comment on lines 545 to 547
if tags == nil {
tags = make(map[string]string)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because it get's called on the reflect call and it has no tags?

Because maybe we should tag it with something in that case

var ok bool
reflect, ok = v.(bool)
if !ok {
return false, fmt.Errorf(`invalid reflect value %#v need bool`, v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false, fmt.Errorf(`invalid reflect value %#v need bool`, v)
return false, fmt.Errorf(`invalid value for 'reflect': '%#v', it needs to be boolean`, v)

I hope the added quotes will help users understand that it is the value of 'reflect' instead of some kind of reflect value ... also in general good idea to print random values you get in quotes in order to be easier to understand where they start and end.

@@ -69,6 +71,7 @@ var (
type Client struct {
mds map[string]protoreflect.MethodDescriptor
conn *grpc.ClientConn
sync sync.Once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename this to reflectOnce ?

Comment on lines 303 to 309
var err error
c.sync.Do(func() {
err = c.reflect(ctxPtr)
})
if err != nil {
return false, fmt.Errorf("error invoking reflect API: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err will only be populated on the first run which means that if you run multiple iterations and the first one fails reflect the API you will get

ERRO[0000] error invoking reflect API: rpc error: code = Unimplemented desc = unknown service grpc.reflection.v1alpha.ServerReflection
        at go.k6.io/k6/js/common.Bind.func1 (native)
        at file:///home/mstoykov/work/k6io/k6/samples/grpc_reflection.js:7:64(9)  executor=shared-iterations scenario=default source=stacktrace
ERRO[0000] method "/main.RouteGuide/GetFeature" not found in file descriptors
        at go.k6.io/k6/js/common.Bind.func1 (native)
        at file:///home/mstoykov/work/k6io/k6/samples/grpc_reflection.js:9:13(21)  executor=shared-iterations scenario=default source=stacktrace
ERRO[0000] method "/main.RouteGuide/GetFeature" not found in file descriptors
        at go.k6.io/k6/js/common.Bind.func1 (native)
        at file:///home/mstoykov/work/k6io/k6/samples/grpc_reflection.js:9:13(21)  executor=shared-iterations scenario=default source=stacktrace
ERRO[0000] method "/main.RouteGuide/GetFeature" not found in file descriptors
        at go.k6.io/k6/js/common.Bind.func1 (native)
        at file:///home/mstoykov/work/k6io/k6/samples/grpc_reflection.js:9:13(21)  executor=shared-iterations scenario=default source=stacktrace
ERRO[0000] method "/main.RouteGuide/GetFeature" not found in file descriptors
        at go.k6.io/k6/js/common.Bind.func1 (native)
        at file:///home/mstoykov/work/k6io/k6/samples/grpc_reflection.js:9:13(21)  executor=shared-iterations scenario=default source=stacktrace

as you can see the much more prevalent message is that k6 couldn't find file descriptors.

This also means that if the connection fails for any reason on the first request you practically bricked this connection.

Additional problems:

  • if the server reloads to change the definition and the user decided to use reflection so that ... works, it won't
  • The connection can technically connect to multiple servers over many iterations, but this will prevent it to load different definitions for them.

I currently am of the opinion that it might be better off to not cache the definition and keep reloading them on each iteration just so that we don't tie our hands now. On the other hand, IMO this whole grpc API will be thrown out at some point when we start implementing everything it should support, so ... might as well have caching now 🤷 .

From my experiments. 100 VUs with caching it does 2.3k without caching it does a bit above 1k iter/s and with the __ITER == 0 hack around 10k.

We can obviously keep adding flags, but again IMO the whole grpc module will need a rewrite and redoing of the API at some point so I am not certain how much adding flags for corner cases as the mentioned above is ... reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the server reloads to change the definition and the user decided to use reflection so that ... works, it won't

is this a problem? grpc reflection isn't really a dynamic thing that is expected to change definitions ever, at least in all the time that I've used it

Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some more feedback. Don't feel like you should just start working on it as we are likely going to have to discuss it some more :).

Also thanks for the PR again 🙇

Comment on lines +302 to +308
var err error
c.reflectOnce.Do(func() {
err = c.reflect(ctxPtr)
})
if err != nil {
return false, fmt.Errorf("error invoking reflect API: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code still makes it so that in the case of the reflect not working out, on next iterations, you won't even get the error again. In the case of a real script with multiple VUs this will mean that you will just get the error from the Invoke not working (or working sometimes depending I guess on how it fails in the case that for example some calls get parsed and work while the rest error out - I don't know if that is possible).

Additionally the current grpc API let's people connect to different servers through multiple connect calls which means that in that case this feature will just not work which seems ... bad to me.

I don't really have a really good fix for both cases (and I am not certain the second one is important 🤷), but the proposal is to have something like:

client.Connect(....)
if (!client.HasReflected()) {
	
  client.Reflect() // this always does the reflection on the current 
}

But preferably with a better interface :).

Don't just do this change, we will have to discuss it and this change is already unlikely to land in v0.34.0 :(.

}
resp, err := methodClient.Recv()
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if some of those errors are wrapped with fmt.Errorf and some additional context as this to me seems like something that will likely get ... not very useful error messages.

Comment on lines +316 to +320
client := reflectpb.NewServerReflectionClient(c.conn)
methodClient, err := client.ServerReflectionInfo(*ctxPtr)
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call emits a metric(which is why you want to return non-nil tags in the GetTags) and after some internal discussion we kind of don't want to emit it at all if possible and the simplest solution so far is to make a new context with cancel and cancel it on exit from here.

For some reason the metric gets emitted on the connection close but I couldn't find to prevent this, but that is another problem it seems like something lives way too much. This also means that it's easier to reproduce it with multiple iterations as otherwise sometimes the metric doesn't get emitted fast enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I guess it might be better to just set an empty map as the tags here instead of changing the getTags method

@na-- na-- modified the milestones: v0.34.0, v0.35.0 Aug 27, 2021
@joshcarp
Copy link
Contributor Author

Hey,
I think this PR was a good exercise on how to implement grpc server reflection but considering that there are massive performance issues with it It might be best to not merge this until the grpc module has been refactored to support such a feature.

I think that merging it might give people the (possibly) incorrect idea that the grpc functionality is reliable and ready to use, which it is not.

If I get time I'll run more benchmarks against a non-grpc server and possibly using a non k6 tool to triage grpc issues.
Stuff that needs investigation:

  • gRPC performing at nearly 50% of HTTP equivalent #1846
    • is there an issue with the k6 implementation of grpc in k6? or is grpc slower in this case. If so;
    • is there an issue with the example grpc server?
    • what is the impact of grpc server reflection on this baseline performance?

@na--
Copy link
Member

na-- commented Aug 31, 2021

what is the impact of grpc server reflection on this baseline performance?

Hmm it shouldn't be a big problem if the initial connect() method is slower when reflection is enabled, that is somewhat expected... 😕 So I don't think it's a blocker for this PR.

Also, in general, we shouldn't compare k6 gRPC performance to k6 HTTP performance, that would be comparing apples to oranges. If we're worried about k6's gRPC implementation being slow for some reason, we should compare it to other gRPC implementations. But, again, that shouldn't be a big concern here, since the performance of the connect() method is generally not critical, and since we expect it to be slower with reflection enabled.

@inancgumus inancgumus self-assigned this Oct 5, 2021
@inancgumus inancgumus closed this Oct 5, 2021
@inancgumus inancgumus mentioned this pull request Oct 5, 2021
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support to use gRPC Server Reflection
6 participants