-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: A new way to gRPC #3947
WIP: A new way to gRPC #3947
Conversation
Be warned here come the thoughts of someone not having a clue about RPC in general and who certainly doesn't know how gRPC has been done in Mumble before :P
What is the codegen being used for anyways? I assume we have some sort of interface and the codegen-part generates the respective header files that I can implement in my program, if I want to use gRPC?
To me this seems like a big source of dead-locks due to people forgetting to explicitly assign
As mentioned on IRC already, I think that this is currently a no-go. From what I know, we want to support the oldest Ubuntu LTS which currently is 16.04. This comes with gcc 5.3 that even only has experimental support for cpp11. Of course one could install a more recent version and once it is compiled with the newer compiler, it should be able to run on 16.04 just fine, so I'm not quite sure, whether this argument holds.
Talking about TMP I have to say, that I don't have good experience using it. It makes code hard to read and hard to follow and even worse: Once something gets wrong you just get a huge wall of text spit into your face from which you have to figure out what went wrong (and where and why). I'd definitely favor an object-oriented approach instead of TMP for better readability and error-handling. I guess your main reason for using TMP was not having dynamic dispatch.
Where's a list of supported architectures? If only exotic ones aren't supported, I don't think this will be a huge deal...
This I think is mainly a concern of how well will you document how it is to be debugged then. As long there's a good documentation on this, I think it should be okay (not great obviously, but shoudl be okay).
Since it can be disabled via Preprocessor, one could run memory-sanitizers without gRPC on the rest of Mumble and then try fiddling with the RPC code in a different way. Besides I don't even think we're currently using any (though we definitely should).
Could you figure out what the main contribution to this is? (Also what's the size of a murmur-build without gRPC? - I can't check it myself right now) And finally a small request: could you illustrate an example of how the gRPC would be used (for a very simple task), so I have something in mind where this is going? :) |
Well, there are two parts to it. One is that it actually makes the gRPC server -- this part I haven't touched at all. The second it that it created a bunch of objects that inherited from a few RPC templates that have some helper functions. The code is virtually 100% boilerplate with the idea that all you need to write calls is to write an impl method that actually did the work, which works pretty well on simple calls. All you really need from it, however, is what calls you have, what objects they use to read/write, and the function to hook into the completion queue. Which is all they have now; they are essentially stateless functors with typedef'd info on what you need to make them work, and one static method that gives you the call you need to hook into the completion queue, and a few functions that you need to implement to actually deal with being called.
Of course it releases it when it goes out of scope. It is the 'deleter' for the pointer. But you can also assign nullptr and release it early without putting in artificial code blocks to make it go out of scope early.
Right here. And it does support most things you'd actually be running it on. I'm running out of time as I have to go to work, but one last thing on the c++ version. All I really need is to be able to initialize move only objects into lambdas. The rest of it was me being lazy unpacking tuples and not want to write ::type everywhere. I think with bind I may be able to get it to work with just c++11, but it'd be a bit ugly. As for all the TMP, I'll save that for when I have some more time--but you are right. You mess it up and you get slammed with a wall of text that is nearly unreadable. |
Finally had some time to work on this. I've taken out everything that is building in c++14 mode breaks. The most surprising of which was that template instantiation works a little different so I had to add a helper to the containers. Also made some improvements that clang-tidy suggested. As to why all the TMP? I figured the design goal was to
The current design tries to accomplish objective 1 by having the codegen make a bunch of boilerplate to set up objects. I figured if all you wanted was to write a function or two to implement something, what you are really writing is just some functor. The current design tries to do objective 2 by just running the event loop without notifiers while it waits when you need to read/write something, and by attempting to reference count the objects. I abuse shared_ptr to reference count for me, and futures from the fibers packages allows you to read/write and appear to be in a synchronous interface, while never having to worry about blocking the main event loop. I also changed the containers that that a RPC call can delete itself when it knows it's finished (this is the main reason for multi_index containers). The current design tries to do objective 3 by allocating tons of functions on the heap, then using those to make an event.. which has an embedded function that will be called by the event handler, then injecting those events into the main event loop. This is where I differ a lot in the design. A lot of the work in templates was creating the containers to be very difficult to use without data races. If you want to iterate; you can just get a 'locked' range which has a copy of all the data you need in it, and by making weak_ptr into a shared_ptr at this time, you never need to check if your object has expired. Also, I just use the fiber scheduler to switch anything that needs to run in the main event loop by tossing the fiber it lives in into that thread and letting the fiber scheduler call it when it yield()s after doing any events. I also wanted to try to prevent so much heap allocation; since a fiber can wait until the callback runs, I can now allocate all the function calls on the stack. This means that you should never end up attempting to call a deleted object, and you should never have to worry about any memory leaks. Next up is trying to remove all the c++14 templates I've used. I think most of them can be replaced by boost's type_traits. Then I need to figure out the moveing objects into lambda problem. |
Thanks for writing this down :) First of all let me say, that I think the 3 objectives you listed sound very good to me and I think we should definitely try to stick to them 👍 In regards to TMP I'm not really convinced (yet) that there really is a need for it. But maybe I am missing something.
I'd assume the code-generation to make this happen. Be it via generating some templates or by generating actual classes/objects/plain functions.
This sounds very complicated and I can't say that I understood it in its entirety. Thus the simple question here: Are you relying on TMP for this and/or could this also be done without it?
These functions that take over the event-ejection could in principle be plain old functions that don't require TMP, couldn't they? From what you wrote nothing appeared to be TMP-dependent to me...
I'd consider these kind of things to be possible with Objects as well. I guess it'd be best if we were able to get on the same page regarding TMP, before you put in even work into that direction. What are the advantages you see in TMP and what are potential features that you are using that can only be realized via TMP (or would be rather hard to do without it)? |
Now that it builds in c++11 mode, I have some more time to comment on this. I think that, in order to use the c++ gRPC library, you're going to end up writing a lot of template code. For example, the main service object used is really a typedef:
Nearly all the objects you actually interact with to use gRPC are templates. So if the library naturally wants everything have an interface based on a template it makes sense that the implementation code is also going to be heavily templated. And the great thing is once it's working, it just works, and since your interface is a template you can't have any derived classes that start depending on your implementation details if you want to change how things work entirely. I think the only real drawback to the TMP I'm using right now is that you somewhat have to remember what the order of the stuff is in your tuples. I could have the containers be based on structs, and it would make working with the data a bit easier, but it would mean I would have to write each one out as there isn't really any introspection in C++, or write a script to generate the code for them, which I'm not a fan of either. I know there exist 'tagged' tuple libraries out there, and I'm still trying to think of a way to tag them in a lightweight way so you don't have to rely on knowing that the 0th object is the server id. On the plus size, it's basically feature complete now; I know there are tons of feature requests for better RPC features. It would be an interesting test case to see what it would take to add one of them; see how much new code is actually needed. On the size issue, I just did a static release build without any RPC, As to performance? Unfortunately, I don't exactly use mumble at all... haven't since I played WoW years ago, so I still have to figure out a way to load test and make sure that the overhead from injecting fibers into the event loop isn't a burden. I would also like to not allocate my fiber properties on the heap for each fiber made; I think boost has a nice memory pool that will make that a bit more efficient. I also need to test to make sure authentication is working properly--It's one of the few 'stream-stream' types but I don't don't have client code for it. I know that textmessagefilters, even if you put in an artificial delay of several seconds for a response, don't seem to have any negative impact on things, except that your text messages are very much delayed. I'm hoping that the very few context switches I have to do by relying on the fiber interface will make it so that there aren't performance issues with heavy RPC load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Far from being a full review, but I skimmed over the sources.
The thing that I noticed was documentation. Or rather the absence thereof.
There's no way I'll be able to grasp what you are doing, if there is no documentation stating what each function is supposed to do. This will probably also affect potential maintainers of this code in the future.
Therefore I'd ask you to add Doxygen comments for functions and varibales that you have added/changed. I think it is especially important to add this to all the TMP stuff, so that someone not that familiar with TMP (like me) is reading through the source code, there is some text explaining what kind of magic is happening here :)
If someone needs it (like I did) to actually understand what gRPC is: https://grpc.io/docs/guides/ and afterwards https://grpc.io/docs/tutorials/basic/cpp/ 😇
Lol that typedef alone scares me and basically proves my argument against TMP xDD
Uhm... The c++ examples on the linked website don't contain templates and what I identified as the gRPC API header-file doesn't contain templates either (https://grpc.github.io/grpc/cpp/grpc_8h_source.html). What am I missing?
Well I guess that argument could be made in both directions. Personally I'd consider it way more convenient to change behaviour by extending some base-class instead of overloading a template...
Plus all that has been mentioned before... And I think the tuple-stuff could make it potentially even (a lot) worse. I can imagine people confusing the order and thus completely messing up what their code does although it compiles fine or that you get this wrong and your screen explodes because of the TMP error messages that probably won't even help you understanding what you did wrong.
I think the size is fine as it is. 3MB extra won't hurt anyone, I guess :)
Given that atm the gRPC feature is more or less "experimental" anyways, I think it wouldn't be too dramatic if it turned out that there were performance issues (as long as they are being fixed at some point of course). |
I would direct your attention to https://grpc.github.io/grpc/cpp/impl_2codegen_2async__stream__impl_8h_source.html about the actual objects used in the async interface to do communication. It's all template-land. As to the tuples, still thinking about a design fix for that. As and end user, you'll figure out pretty quick, without getting a bunch of template messages if you've got them wrong. The instant you use the wrong value in something that expects a concrete type of something strong typing will let you know that you've passed something wrong. I think the answer might be a struct that just has all the members be hooked into the tuple, or a specialization of get in the Container namespace that you can use with meaningful tags. boost::hana I think could already do that for me but it relies on c++14. As far as extending this? I think it's not that hard to specialize templates once you get slightly familiar with it. I hadn't been using c++ since the 00's when I started working on this. Once you get down the correct idioms to use enable_if it becomes a lot easier. I've tried my best to make end-users not get horrible template error messages with static_asserts in places that would expect end user code. Concepts, of course, would make this way simpler, but that's a c++20 feature. RE: documentation. Yup. It needs it. Most of the important parts about how the thing works are in the big block comment in MurmurGRPCImpl.cpp. My experience with reading doxygen produced documentation have been less than stellar, and my writing experience is nil. However, I'd be willing to put in the work to document it all. In addition, the code style is inconsistent. clang-tidy basically wants me to always use trailing return types and prefer using to typedef. I think I may reformat all the header code to use the "new" style, and leave MurmurGRPCImpl.cpp using the older style. |
Alright then I think I have to really look through this once the documentation is there :) |
You're gonna find out that grammar and spelling are my dump stats. WIP can be found at: https://gist.github.com/McKayJT/619bc92f00318cf3604e1e5788284e57 Let me know if that's the type of documentation that you're looking for or if I'm totally off-base. Still haven't figured out a good way to document the interface that is expected of 'Derived' objects from the codegen. I think I may have to just set up a custom documentation page with a few concrete examples, and then link it into the header description for RPCCall. Still thinking of a way to make the tuples more accessible. |
That's exactly what I was looking for! :) Thus
should be
EDIT: Oh and the documentation of the template parameter might be nice as well (using Doxygen's |
4189af3
to
d38cffb
Compare
Rebased against master -- I think this is functionally complete (except maybe the build environment should tell it what version of c++ to use? Or I can hack around it. Adding -std=c++11 forces boost to turn off support for a lambda initialization which is much better than having to use std::bind() to work around it. I have also added support for LTO and tcmalloc for g++. tcmalloc helps with boost::fibers as it constantly creates small stacks and the other option would be a pool interface; that would be a pain. LTO is just for efficiency. tcmalloc requires LTO---static builds without it will not work properly in musl. TLDR: Last call for objections against the design before I start making API breaking changes to the interface. proto2 is not good for gRPC and there are lots of missing things like being able to gracefully close a streaming filter. I also think I need to change some of the signals so that they don't pass raw pointers -- after network time, there is no way to know if they are valid or not. Also of note:
No dbus or ICE, but much smaller, even with tmalloc linked in with |
d7a1ca2
to
d75ff45
Compare
@McKayJT is this PR still WIP or is it done by now? |
8919e02
to
0b63605
Compare
To everyone that wants to build this: Note that you'll have to have the following components installed:
@McKayJT it would be great if you/we could add checks to the build env that detect if these components are missing and give a proper error message instead of letting the compilation fail with some more or less cryptic error messages. |
src/Utils.h
Outdated
#ifdef USE_GRPC | ||
# include <QDebug> | ||
# include <string> | ||
# include <grpcpp/support/string_ref.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my system this has to be
# include <grpc++/support/string_ref.h>
instead. Did this name change with some version of grpc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to get this to compile.
First I had to get some dependencies that the build-system didn't check for (so I had to deduce from cryptic error messages what might be missing).
Then I had to build and install Boost myself as the version packaged with my Distro was too old (and incomplete for that matter). This wasn't checked by the build-system either.
As it is, I am unable to build and thus unable to review. Thus for now I'd ask you to try and add some checking of required libraries and their versions during initial setup.
@@ -0,0 +1,1248 @@ | |||
// Copyright 2005-2019 The Mumble Developers. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filename is missing a C
/// implementation reads in the message, checks validity, then adds a `std::weak_ptr` along | ||
/// with other details to a container of listeners. | ||
/// | ||
template<typename X = RPCType, boost::enable_if_t<std::is_same<X, Unary_t>::value>* = nullptr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boost::enable_if_t
is a somewhat recent addition to Boost. Recent enough to not be part of the version packaged in the Ubuntu repos (at least not all of them).
Thus you should check which version of Boost or in this case which version of the type_traits
module is installed and issue an error if it is not sufficient.
src/murmur/MurmurGRPCImpl.h
Outdated
} | ||
using MurmurRPC::Wrapper::RPCCall; | ||
namespace mwc = MurmurRPC::Wrapper::Container; | ||
namespace mw = MurmurRPC::Wrapper; | ||
|
||
class MurmurRPCAuthenticator : public ::grpc_impl::AuthMetadataProcessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my machine grpc_impl::
does not exist. I only have the namespace grpc::
Needs a lot of cleanup, more of a proof of concept
Sorry about the delay in response; I got busy as well. I am going to install a container with Ubuntu bionic so I can test against the old versions of gRPC and boost. As to package checks, is it worth doing them in the current build system? It seems like once the cmake build system is working that it would be far easier to check to make sure that the correct versions are in place. I've pushed some fixes for older versions of gRPC, boost, and Qt. I've included mp11 and callable_traits as submodules (they work standalone without the rest of boost). I rely heavily on mp11 and callable_traits and there is no good workaround except just including them as a submodule. You also need to grab the libboost-fiber-dev and libboost-context-dev packages for the fiber libraries. There are some nasty visibility warnings when I build this in an Ubuntu bionic container, but it does build. |
Old versions of boost do not have these, but it should work as a standalone module.
also remove container_hash
for older boost compat
QtRandomGenerator not in old version of Qt
didn't include this when I changed trees
add boost::enable_if_t if not included, based on std::enable_if define BOOST_ATTRIBUTE_NODISCARD to be empty if undefined use QObject:connect instead of callOnTimeOut convencience method introduced in qt 5.10
builds without pch fail with this not defined
No problem - I had enough stuff to do in the meantime to not get bored :D
Yeah this is probably true. Especially since the PR for the cmake port should be opened in the coming days now... Thanks for the fixup - I'll have to try and compile it again in the coming days ☝️ |
Okay I just tried compiling and it worked flawlessly 👍 What would you say is the easiest way of testing the gRPC interface in action? Is there some existing tool or something that I can use to (indirectly) verify stuff seems to be working properly? :) |
Is the API still the same? There is murmur-cli and a fork of it |
Yeah i use murmur-cli for testing. It doesn't have support for the AuthenticatorStream endpoint yet, so that feature remains somewhat untested. I use TextMessageFilter for stream-stream testing, combined with |
Ha. Once I actually enabled grpc in the init file, I get a segmentation fault when starting up murmur:
This is my ini file:
|
Any news on this? |
No - it seems that @McKayJT did not have the time to continue working on this PR yet ☝️ |
Closing this as discontinued for now |
The gRPC implementation never left the experimental state and never reached a properly stable state to the point where we would feel good about enabling it by default. In addition to that, there has been no further attempts at finding and fixing the encountered issues in the implementation (except mumble-voip#3947 but that was discontinued). As such we had an essentially unmaintained piece of code in our server implementation that was known to be buggy and that nobody wanted to fix. In addition to that the implementation itself could not be considered very clean or elegant and therefore only represented a few smelly corners in our code base. For this reason, we decided to remove the gRPC support entirely from Mumble (for now). What we hope to gain by that is: - Prevent people from building unstable server versions and then coming to us complaining that it crashed/misbehaved - Removing (essentially) dead code - Reduce the RPC implementation complexity That last piece is crucial: By removing gRPC support we reduce the amount of supported RPC frameworks to only one (ignoring DBus for now). Our future plans include a refactoring of how RPC is being handled and implemented and only having to worry about maintaining compatibility with one RPC system is much easier than having to worry about two (with (slightly) different APIs). Once the RPC implementation has been rewritten, more RPC backends may be reintroduced and in that process we might investigate adding a proper gRPC implementation to the code (that then hopefully is more stable than the current one). Fixes mumble-voip#4567 Fixes mumble-voip#4197 Fixes mumble-voip#3496 Fixes mumble-voip#3429 Fixes mumble-voip#3265
The gRPC implementation never left the experimental state and never reached a properly stable state to the point where we would feel good about enabling it by default. In addition to that, there has been no further attempts at finding and fixing the encountered issues in the implementation (except mumble-voip#3947 but that was discontinued). As such we had an essentially unmaintained piece of code in our server implementation that was known to be buggy and that nobody wanted to fix. In addition to that the implementation itself could not be considered very clean or elegant and therefore only represented a few smelly corners in our code base. For this reason, we decided to remove the gRPC support entirely from Mumble (for now). What we hope to gain by that is: - Prevent people from building unstable server versions and then coming to us complaining that it crashed/misbehaved - Removing (essentially) dead code - Reduce the RPC implementation complexity That last piece is crucial: By removing gRPC support we reduce the amount of supported RPC frameworks to only one (ignoring DBus for now). Our future plans include a refactoring of how RPC is being handled and implemented and only having to worry about maintaining compatibility with one RPC system is much easier than having to worry about two (with (slightly) different APIs). Once the RPC implementation has been rewritten, more RPC backends may be reintroduced and in that process we might investigate adding a proper gRPC implementation to the code (that then hopefully is more stable than the current one). Fixes mumble-voip#4567 Fixes mumble-voip#4197 Fixes mumble-voip#3496 Fixes mumble-voip#3429 Fixes mumble-voip#3265
The gRPC implementation never left the experimental state and never reached a properly stable state to the point where we would feel good about enabling it by default. In addition to that, there has been no further attempts at finding and fixing the encountered issues in the implementation (except #3947 but that was discontinued). As such we had an essentially unmaintained piece of code in our server implementation that was known to be buggy and that nobody wanted to fix. In addition to that the implementation itself could not be considered very clean or elegant and therefore only represented a few smelly corners in our code base. For this reason, we decided to remove the gRPC support entirely from Mumble (for now). What we hope to gain by that is: Prevent people from building unstable server versions and then coming to us complaining that it crashed/misbehaved Removing (essentially) dead code Reduce the RPC implementation complexity That last piece is crucial: By removing gRPC support we reduce the amount of supported RPC frameworks to only one (ignoring DBus for now). Our future plans include a refactoring of how RPC is being handled and implemented and only having to worry about maintaining compatibility with one RPC system is much easier than having to worry about two (with (slightly) different APIs). Once the RPC implementation has been rewritten, more RPC backends may be reintroduced and in that process we might investigate adding a proper gRPC implementation to the code (that then hopefully is more stable than the current one). Fixes #4567 Fixes #4197 Fixes #3496 Fixes #3429 Fixes #3265
The old gRPC system had a lot to be desired. Heap allocated objects flying around. The RPC class was trying to be boost::intrusive_ptr. And things as simple as cancelling a text message filter could cause the server to crash.
This is a new gRPC, based loosely on the old one. While it has many advantages, there are also many disadvantages that I would like feedback on. In the new system, the gRPC codegen generates very little; nothing more than a few typedefs and some function declarations. You then create a RPC object that uses this information to create an template that has an interface that somewhat depends on the object you gave to it.
One advantage of this, is that there is almost no dynamic dispatch, but they don't really share a base class anymore. However, basically no code cares about working with generic RPC objects. They know which ones they are after.
I have also changed most of the containers used. The primary reason for this is that I wanted to be able to look up by the RPCId (a random number b/c weak_ptr doesn't really have any good support for being in an ordered container). It also allows me to control the way the container is accessed because deletes could happen at any time.
The containers support a few basic ways of access. One is that you can send in a lambda that returns a pair of iterators. This will then be filtered to remove all expired weak_ptr (as well as converting them into shared_ptr). Due to this range being a copy, it cannot be invalidated when iterating over it, and having the shared pointers is a promise that the objects in your range stay valid. Another access method is to request an index. You can either get a const index (for quick find operations), or a unique pointer to an index that is non-const. The 'deleter' of the unique pointer holds a mutex which locks access to the container while you are using it. When you are done, assign nullptr to the index ptr and the mutex is released.
The other big change is the way callback works. RPCExecEvent is gone. When you create a new fiber, you can tell it if you want to execute in the thread it started it, or you can request it be executed in the event loop thread. It also means that the completion queue doesn't need heap allocated objects, most of the time the sender is waiting in another fiber for the response. The CQ no longer deletes the items it pulls out of the queue -- they are all on the stack.
There is also some reorganization (letting meta delete the servers so that the CQ is still running on shutdown/using recursive mutexes as one thread could try to obtain the same mutex several times if the gRPC connection has high latency).
The biggest cons on this are: I have used c++14 pretty heavily. In fact, this change set even has some c++17 stuff in the form of structured bindings. The biggest thing i need from it is to be able to bind move-only objects into lambas. I think there is another way to make this work as well using c++11 only , but it's painful. Also with all the template meta-programming that has been done in RPC some of the convenience functions really do help make the syntax a bit more terse.
The other big con is Boost Fibers. It's not header only. It doesn't support all architectures. Because it does context switching by messing with the stack memory sanitizers have a hard time with it. It also can make debugging challenging because if you wait till std::abort is called, the stack looks nothing like it did during execution and it's challenging to get a decent backtrace.
A lesser problem is size, a fully stripped binary, statically linked against musl, with ICE, DBUS, bonjour, disabled--just basically murmur and gRPC is ~15mb. That's pretty large.
So, while this is full of extraneous debugging junk, and no real unified style as it was put together over many sleepless nights, I hope that I can get some feedback about if this general design is a feasible way to move forward. I think that this method makes it much easier to add new stuff to the interface, and at the same time makes it difficult to misuse.
All feedback is appreciated!
Fixes #4197