Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: A new way to gRPC #3947

Closed
wants to merge 30 commits into from
Closed

Conversation

McKayJT
Copy link
Contributor

@McKayJT McKayJT commented Jan 20, 2020

The old gRPC system had a lot to be desired. Heap allocated objects flying around. The RPC class was trying to be boost::intrusive_ptr. And things as simple as cancelling a text message filter could cause the server to crash.

This is a new gRPC, based loosely on the old one. While it has many advantages, there are also many disadvantages that I would like feedback on. In the new system, the gRPC codegen generates very little; nothing more than a few typedefs and some function declarations. You then create a RPC object that uses this information to create an template that has an interface that somewhat depends on the object you gave to it.

One advantage of this, is that there is almost no dynamic dispatch, but they don't really share a base class anymore. However, basically no code cares about working with generic RPC objects. They know which ones they are after.

I have also changed most of the containers used. The primary reason for this is that I wanted to be able to look up by the RPCId (a random number b/c weak_ptr doesn't really have any good support for being in an ordered container). It also allows me to control the way the container is accessed because deletes could happen at any time.

The containers support a few basic ways of access. One is that you can send in a lambda that returns a pair of iterators. This will then be filtered to remove all expired weak_ptr (as well as converting them into shared_ptr). Due to this range being a copy, it cannot be invalidated when iterating over it, and having the shared pointers is a promise that the objects in your range stay valid. Another access method is to request an index. You can either get a const index (for quick find operations), or a unique pointer to an index that is non-const. The 'deleter' of the unique pointer holds a mutex which locks access to the container while you are using it. When you are done, assign nullptr to the index ptr and the mutex is released.

The other big change is the way callback works. RPCExecEvent is gone. When you create a new fiber, you can tell it if you want to execute in the thread it started it, or you can request it be executed in the event loop thread. It also means that the completion queue doesn't need heap allocated objects, most of the time the sender is waiting in another fiber for the response. The CQ no longer deletes the items it pulls out of the queue -- they are all on the stack.

There is also some reorganization (letting meta delete the servers so that the CQ is still running on shutdown/using recursive mutexes as one thread could try to obtain the same mutex several times if the gRPC connection has high latency).

The biggest cons on this are: I have used c++14 pretty heavily. In fact, this change set even has some c++17 stuff in the form of structured bindings. The biggest thing i need from it is to be able to bind move-only objects into lambas. I think there is another way to make this work as well using c++11 only , but it's painful. Also with all the template meta-programming that has been done in RPC some of the convenience functions really do help make the syntax a bit more terse.

The other big con is Boost Fibers. It's not header only. It doesn't support all architectures. Because it does context switching by messing with the stack memory sanitizers have a hard time with it. It also can make debugging challenging because if you wait till std::abort is called, the stack looks nothing like it did during execution and it's challenging to get a decent backtrace.

A lesser problem is size, a fully stripped binary, statically linked against musl, with ICE, DBUS, bonjour, disabled--just basically murmur and gRPC is ~15mb. That's pretty large.

So, while this is full of extraneous debugging junk, and no real unified style as it was put together over many sleepless nights, I hope that I can get some feedback about if this general design is a feasible way to move forward. I think that this method makes it much easier to add new stuff to the interface, and at the same time makes it difficult to misuse.

All feedback is appreciated!

Fixes #4197

@Krzmbrzl Krzmbrzl changed the title A new way to gRPC [not ready to merge yet, for comment purposes] WIP: A new way to gRPC Jan 21, 2020
@Krzmbrzl
Copy link
Member

Be warned here come the thoughts of someone not having a clue about RPC in general and who certainly doesn't know how gRPC has been done in Mumble before :P
Also I haven't looked at the actual code yet - these are just some comments on what you have written.

In the new system, the gRPC codegen generates very little; nothing more than a few typedefs and some function declarations.

What is the codegen being used for anyways? I assume we have some sort of interface and the codegen-part generates the respective header files that I can implement in my program, if I want to use gRPC?

When you are done, assign nullptr to the index ptr and the mutex is released.

To me this seems like a big source of dead-locks due to people forgetting to explicitly assign nullptr. Can't this be automated once the pointer goes out-of-scope?

I have used c++14 pretty heavily. In fact, this change set even has some c++17 stuff in the form of structured bindings.

As mentioned on IRC already, I think that this is currently a no-go. From what I know, we want to support the oldest Ubuntu LTS which currently is 16.04. This comes with gcc 5.3 that even only has experimental support for cpp11. Of course one could install a more recent version and once it is compiled with the newer compiler, it should be able to run on 16.04 just fine, so I'm not quite sure, whether this argument holds.
This is something @davidebeatrici has to decide on. From my point of view it would certainly be desirable to use plain cpp11, but as long as you only rely on cpp14 syntax and not cpp14 std-library, I don't think it would be that much of a burden.
Cpp17 is a bit recent though and I think I'd definitely prefer to not need it for compilation (I'm not even sure whether my gcc-version fully supports it yet xD)

Also with all the template meta-programming that has been done in RPC some of the convenience functions really do help make the syntax a bit more terse.

Talking about TMP I have to say, that I don't have good experience using it. It makes code hard to read and hard to follow and even worse: Once something gets wrong you just get a huge wall of text spit into your face from which you have to figure out what went wrong (and where and why).

I'd definitely favor an object-oriented approach instead of TMP for better readability and error-handling. I guess your main reason for using TMP was not having dynamic dispatch.
Without having much experience in comparing the two or having done any profiling of the code at hand, I'd assume that dynamic dispatch is not really a bottle-neck here. From my understanding gRPC doesn't perform tasks that are so time-critical that this dispatch would pose a problem (correct me if I'm wrong). Thus I'd argue that we don't really need that kind of optimization if it goes on cost of increased maintenance-work.

The other big con is Boost Fibers. It's not header only. It doesn't support all architectures.

Where's a list of supported architectures? If only exotic ones aren't supported, I don't think this will be a huge deal...

It also can make debugging challenging because if you wait till std::abort is called, the stack looks nothing like it did during execution and it's challenging to get a decent backtrace.

This I think is mainly a concern of how well will you document how it is to be debugged then. As long there's a good documentation on this, I think it should be okay (not great obviously, but shoudl be okay).

Because it does context switching by messing with the stack memory sanitizers have a hard time with it

Since it can be disabled via Preprocessor, one could run memory-sanitizers without gRPC on the rest of Mumble and then try fiddling with the RPC code in a different way. Besides I don't even think we're currently using any (though we definitely should).

A lesser problem is size, a fully stripped binary, statically linked against musl, with ICE, DBUS, bonjour, disabled--just basically murmur and gRPC is ~15mb. That's pretty large.

Could you figure out what the main contribution to this is? (Also what's the size of a murmur-build without gRPC? - I can't check it myself right now)

And finally a small request: could you illustrate an example of how the gRPC would be used (for a very simple task), so I have something in mind where this is going? :)

@McKayJT
Copy link
Contributor Author

McKayJT commented Jan 21, 2020

What is the codegen being used for anyways? I assume we have some sort of interface and the codegen-part generates the respective header files that I can implement in my program, if I want to use gRPC?

Well, there are two parts to it. One is that it actually makes the gRPC server -- this part I haven't touched at all. The second it that it created a bunch of objects that inherited from a few RPC templates that have some helper functions. The code is virtually 100% boilerplate with the idea that all you need to write calls is to write an impl method that actually did the work, which works pretty well on simple calls.

All you really need from it, however, is what calls you have, what objects they use to read/write, and the function to hook into the completion queue. Which is all they have now; they are essentially stateless functors with typedef'd info on what you need to make them work, and one static method that gives you the call you need to hook into the completion queue, and a few functions that you need to implement to actually deal with being called.

To me this seems like a big source of dead-locks due to people forgetting to explicitly assign nullptr. Can't this be automated once the pointer goes out-of-scope?

Of course it releases it when it goes out of scope. It is the 'deleter' for the pointer. But you can also assign nullptr and release it early without putting in artificial code blocks to make it go out of scope early.

Where's a list of supported architectures? If only exotic ones aren't supported, I don't think this will be a huge deal...

Right here. And it does support most things you'd actually be running it on.

I'm running out of time as I have to go to work, but one last thing on the c++ version. All I really need is to be able to initialize move only objects into lambdas. The rest of it was me being lazy unpacking tuples and not want to write ::type everywhere. I think with bind I may be able to get it to work with just c++11, but it'd be a bit ugly.

As for all the TMP, I'll save that for when I have some more time--but you are right. You mess it up and you get slammed with a wall of text that is nearly unreadable.

@McKayJT
Copy link
Contributor Author

McKayJT commented Jan 23, 2020

Finally had some time to work on this. I've taken out everything that is building in c++14 mode breaks. The most surprising of which was that template instantiation works a little different so I had to add a helper to the containers. Also made some improvements that clang-tidy suggested.

As to why all the TMP? I figured the design goal was to

  1. make it easy to implement new RPC features with a minimal amount of coding
  2. make the async gRPC api appear synchronous
  3. try to make it very difficult to cause data races

The current design tries to accomplish objective 1 by having the codegen make a bunch of boilerplate to set up objects. I figured if all you wanted was to write a function or two to implement something, what you are really writing is just some functor.

The current design tries to do objective 2 by just running the event loop without notifiers while it waits when you need to read/write something, and by attempting to reference count the objects. I abuse shared_ptr to reference count for me, and futures from the fibers packages allows you to read/write and appear to be in a synchronous interface, while never having to worry about blocking the main event loop. I also changed the containers that that a RPC call can delete itself when it knows it's finished (this is the main reason for multi_index containers).

The current design tries to do objective 3 by allocating tons of functions on the heap, then using those to make an event.. which has an embedded function that will be called by the event handler, then injecting those events into the main event loop.

This is where I differ a lot in the design. A lot of the work in templates was creating the containers to be very difficult to use without data races. If you want to iterate; you can just get a 'locked' range which has a copy of all the data you need in it, and by making weak_ptr into a shared_ptr at this time, you never need to check if your object has expired.

Also, I just use the fiber scheduler to switch anything that needs to run in the main event loop by tossing the fiber it lives in into that thread and letting the fiber scheduler call it when it yield()s after doing any events. I also wanted to try to prevent so much heap allocation; since a fiber can wait until the callback runs, I can now allocate all the function calls on the stack. This means that you should never end up attempting to call a deleted object, and you should never have to worry about any memory leaks.

Next up is trying to remove all the c++14 templates I've used. I think most of them can be replaced by boost's type_traits. Then I need to figure out the moveing objects into lambda problem.

@Krzmbrzl
Copy link
Member

Thanks for writing this down :)

First of all let me say, that I think the 3 objectives you listed sound very good to me and I think we should definitely try to stick to them 👍

In regards to TMP I'm not really convinced (yet) that there really is a need for it. But maybe I am missing something.

make it easy to implement new RPC features with a minimal amount of coding

I'd assume the code-generation to make this happen. Be it via generating some templates or by generating actual classes/objects/plain functions.

make the async gRPC api appear synchronous

This sounds very complicated and I can't say that I understood it in its entirety. Thus the simple question here: Are you relying on TMP for this and/or could this also be done without it?

try to make it very difficult to cause data races

These functions that take over the event-ejection could in principle be plain old functions that don't require TMP, couldn't they? From what you wrote nothing appeared to be TMP-dependent to me...

A lot of the work in templates was creating the containers to be very difficult to use without data races.

I'd consider these kind of things to be possible with Objects as well.

I guess it'd be best if we were able to get on the same page regarding TMP, before you put in even work into that direction. What are the advantages you see in TMP and what are potential features that you are using that can only be realized via TMP (or would be rather hard to do without it)?

@McKayJT
Copy link
Contributor Author

McKayJT commented Jan 25, 2020

Now that it builds in c++11 mode, I have some more time to comment on this.

I think that, in order to use the c++ gRPC library, you're going to end up writing a lot of template code. For example, the main service object used is really a typedef:

MurmurRPC::V1::AsyncService' {aka 'MurmurRPC::V1::WithAsyncMethod_GetUptime<MurmurRPC::V1::WithAsyncMethod_GetVersion<MurmurRPC::V1::WithAsyncMethod_Events<MurmurRPC::V1::WithAsyncMethod_ServerCreate<MurmurRPC::V1::WithAsyncMethod_ServerQuery<MurmurRPC::V1::WithAsyncMethod_ServerGet<MurmurRPC::V1::WithAsyncMethod_ServerStart<MurmurRPC::V1::WithAsyncMethod_ServerStop<MurmurRPC::V1::WithAsyncMethod_ServerRemove<MurmurRPC::V1::WithAsyncMethod_ServerEvents<MurmurRPC::V1::WithAsyncMethod_ContextActionAdd<MurmurRPC::V1::WithAsyncMethod_ContextActionRemove<MurmurRPC::V1::WithAsyncMethod_ContextActionEvents<MurmurRPC::V1::WithAsyncMethod_TextMessageSend<MurmurRPC::V1::WithAsyncMethod_TextMessageFilter<MurmurRPC::V1::WithAsyncMethod_LogQuery<MurmurRPC::V1::WithAsyncMethod_ConfigGet<MurmurRPC::V1::WithAsyncMethod_ConfigGetField<MurmurRPC::V1::WithAsyncMethod_ConfigSetField<MurmurRPC::V1::WithAsyncMethod_ConfigGetDefault<MurmurRPC::V1::WithAsyncMethod_ChannelQuery<MurmurRPC::V1::WithAsyncMethod_ChannelGet<MurmurRPC::V1::WithAsyncMethod_ChannelAdd<MurmurRPC::V1::WithAsyncMethod_ChannelRemove<MurmurRPC::V1::WithAsyncMethod_ChannelUpdate<MurmurRPC::V1::WithAsyncMethod_UserQuery<MurmurRPC::V1::WithAsyncMethod_UserGet<MurmurRPC::V1::WithAsyncMethod_UserUpdate<MurmurRPC::V1::WithAsyncMethod_UserKick<MurmurRPC::V1::WithAsyncMethod_TreeQuery<MurmurRPC::V1::WithAsyncMethod_BansGet<MurmurRPC::V1::WithAsyncMethod_BansSet<MurmurRPC::V1::WithAsyncMethod_ACLGet<MurmurRPC::V1::WithAsyncMethod_ACLSet<MurmurRPC::V1::WithAsyncMethod_ACLGetEffectivePermissions<MurmurRPC::V1::WithAsyncMethod_ACLAddTemporaryGroup<MurmurRPC::V1::WithAsyncMethod_ACLRemoveTemporaryGroup<MurmurRPC::V1::WithAsyncMethod_AuthenticatorStream<MurmurRPC::V1::WithAsyncMethod_DatabaseUserQuery<MurmurRPC::V1::WithAsyncMethod_DatabaseUserGet<MurmurRPC::V1::WithAsyncMethod_DatabaseUserUpdate<MurmurRPC::V1::WithAsyncMethod_DatabaseUserRegister<MurmurRPC::V1::WithAsyncMethod_DatabaseUserDeregister<MurmurRPC::V1::WithAsyncMethod_DatabaseUserVerify<MurmurRPC::V1::WithAsyncMethod_RedirectWhisperGroupAdd<MurmurRPC::V1::WithAsyncMethod_RedirectWhisperGroupRemove<MurmurRPC::V1::Service> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >'}

Nearly all the objects you actually interact with to use gRPC are templates. So if the library naturally wants everything have an interface based on a template it makes sense that the implementation code is also going to be heavily templated. And the great thing is once it's working, it just works, and since your interface is a template you can't have any derived classes that start depending on your implementation details if you want to change how things work entirely.

I think the only real drawback to the TMP I'm using right now is that you somewhat have to remember what the order of the stuff is in your tuples. I could have the containers be based on structs, and it would make working with the data a bit easier, but it would mean I would have to write each one out as there isn't really any introspection in C++, or write a script to generate the code for them, which I'm not a fan of either. I know there exist 'tagged' tuple libraries out there, and I'm still trying to think of a way to tag them in a lightweight way so you don't have to rely on knowing that the 0th object is the server id.

On the plus size, it's basically feature complete now; I know there are tons of feature requests for better RPC features. It would be an interesting test case to see what it would take to add one of them; see how much new code is actually needed.

On the size issue, I just did a static release build without any RPC, CONFIG_OPTS="c++17 release no-client no-ice no-bonjour no-grpc no-dbus", the stripped install package is 11.12mb to install. The c++11 release build I have installed on my testing server with gRPC is 14.17mb total. (I also have some custom build flags to get it this size, -fvisibility=hidden is not your friend when your compiler doesn't support lto plugins; better to build with -ffat-lto-objects; otherwise it links in the entire stub gRPC server that's 100% unused). I would consider that a pretty significant increase in size. I think I can get the size down a bit, however, by doing some more optimization regarding the work queues. Also swapping boost::function for std::function may help too. There isn't really a point to using it as gRPC itself relies on std::function; there is not a real purpose to having two type erasing libraries for function calls.

As to performance? Unfortunately, I don't exactly use mumble at all... haven't since I played WoW years ago, so I still have to figure out a way to load test and make sure that the overhead from injecting fibers into the event loop isn't a burden. I would also like to not allocate my fiber properties on the heap for each fiber made; I think boost has a nice memory pool that will make that a bit more efficient. I also need to test to make sure authentication is working properly--It's one of the few 'stream-stream' types but I don't don't have client code for it. I know that textmessagefilters, even if you put in an artificial delay of several seconds for a response, don't seem to have any negative impact on things, except that your text messages are very much delayed. I'm hoping that the very few context switches I have to do by relying on the fiber interface will make it so that there aren't performance issues with heavy RPC load.

Copy link
Member

@Krzmbrzl Krzmbrzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Far from being a full review, but I skimmed over the sources.

The thing that I noticed was documentation. Or rather the absence thereof.

There's no way I'll be able to grasp what you are doing, if there is no documentation stating what each function is supposed to do. This will probably also affect potential maintainers of this code in the future.

Therefore I'd ask you to add Doxygen comments for functions and varibales that you have added/changed. I think it is especially important to add this to all the TMP stuff, so that someone not that familiar with TMP (like me) is reading through the source code, there is some text explaining what kind of magic is happening here :)

src/murmur/Messages.cpp Outdated Show resolved Hide resolved
src/murmur/Messages.cpp Outdated Show resolved Hide resolved
@Krzmbrzl
Copy link
Member

If someone needs it (like I did) to actually understand what gRPC is: https://grpc.io/docs/guides/ and afterwards https://grpc.io/docs/tutorials/basic/cpp/ 😇

For example, the main service object used is really a typedef

Lol that typedef alone scares me and basically proves my argument against TMP xDD

Nearly all the objects you actually interact with to use gRPC are templates.

Uhm... The c++ examples on the linked website don't contain templates and what I identified as the gRPC API header-file doesn't contain templates either (https://grpc.github.io/grpc/cpp/grpc_8h_source.html). What am I missing?

since your interface is a template you can't have any derived classes that start depending on your implementation details if you want to change how things work entirely.

Well I guess that argument could be made in both directions. Personally I'd consider it way more convenient to change behaviour by extending some base-class instead of overloading a template...

I think the only real drawback to the TMP I'm using right now is that you somewhat have to remember what the order of the stuff is in your tuples

Plus all that has been mentioned before... And I think the tuple-stuff could make it potentially even (a lot) worse. I can imagine people confusing the order and thus completely messing up what their code does although it compiles fine or that you get this wrong and your screen explodes because of the TMP error messages that probably won't even help you understanding what you did wrong.

I think I can get the size down a bit

I think the size is fine as it is. 3MB extra won't hurt anyone, I guess :)

I'm hoping that the very few context switches I have to do by relying on the fiber interface will make it so that there aren't performance issues with heavy RPC load.

Given that atm the gRPC feature is more or less "experimental" anyways, I think it wouldn't be too dramatic if it turned out that there were performance issues (as long as they are being fixed at some point of course).

@McKayJT
Copy link
Contributor Author

McKayJT commented Jan 26, 2020

I would direct your attention to https://grpc.github.io/grpc/cpp/impl_2codegen_2async__stream__impl_8h_source.html about the actual objects used in the async interface to do communication. It's all template-land.

As to the tuples, still thinking about a design fix for that. As and end user, you'll figure out pretty quick, without getting a bunch of template messages if you've got them wrong. The instant you use the wrong value in something that expects a concrete type of something strong typing will let you know that you've passed something wrong. I think the answer might be a struct that just has all the members be hooked into the tuple, or a specialization of get in the Container namespace that you can use with meaningful tags. boost::hana I think could already do that for me but it relies on c++14.

As far as extending this? I think it's not that hard to specialize templates once you get slightly familiar with it. I hadn't been using c++ since the 00's when I started working on this. Once you get down the correct idioms to use enable_if it becomes a lot easier.

I've tried my best to make end-users not get horrible template error messages with static_asserts in places that would expect end user code. Concepts, of course, would make this way simpler, but that's a c++20 feature.

RE: documentation. Yup. It needs it. Most of the important parts about how the thing works are in the big block comment in MurmurGRPCImpl.cpp. My experience with reading doxygen produced documentation have been less than stellar, and my writing experience is nil. However, I'd be willing to put in the work to document it all. In addition, the code style is inconsistent. clang-tidy basically wants me to always use trailing return types and prefer using to typedef. I think I may reformat all the header code to use the "new" style, and leave MurmurGRPCImpl.cpp using the older style.

@Krzmbrzl
Copy link
Member

Alright then I think I have to really look through this once the documentation is there :)

@McKayJT
Copy link
Contributor Author

McKayJT commented Jan 28, 2020

You're gonna find out that grammar and spelling are my dump stats.

WIP can be found at: https://gist.github.com/McKayJT/619bc92f00318cf3604e1e5788284e57

Let me know if that's the type of documentation that you're looking for or if I'm totally off-base.

Still haven't figured out a good way to document the interface that is expected of 'Derived' objects from the codegen. I think I may have to just set up a custom documentation page with a few concrete examples, and then link it into the header description for RPCCall. Still thinking of a way to make the tuples more accessible.

@Krzmbrzl
Copy link
Member

Krzmbrzl commented Jan 28, 2020

Let me know if that's the type of documentation that you're looking for or if I'm totally off-base.

That's exactly what I was looking for! :)
Only critique is that in the rest of Mumble's codebase we use /// to mark Doxygen comments and not the JavaDoc-style. I also prefer the latter but for the sake of consitency I think it'd be better if you could also switch to using ///.

Thus

/**
 * Custom deleter for RPCCall objects.
 *
 * Instead of deleting the objects, just sets RPCCall::m_alive to false
 */

should be

/// Custom deleter for RPCCall objects.
///
/// Instead of deleting the objects, just sets RPCCall::m_alive to false

EDIT: Oh and the documentation of the template parameter might be nice as well (using Doxygen's @tparam attribute) :)

@McKayJT McKayJT force-pushed the grpc-rework-cleanup branch from 4189af3 to d38cffb Compare February 20, 2020 10:35
@McKayJT
Copy link
Contributor Author

McKayJT commented Feb 20, 2020

Rebased against master -- I think this is functionally complete (except maybe the build environment should tell it what version of c++ to use? Or I can hack around it. Adding -std=c++11 forces boost to turn off support for a lambda initialization which is much better than having to use std::bind() to work around it.

I have also added support for LTO and tcmalloc for g++. tcmalloc helps with boost::fibers as it constantly creates small stacks and the other option would be a pool interface; that would be a pain. LTO is just for efficiency. tcmalloc requires LTO---static builds without it will not work properly in musl.

TLDR: Last call for objections against the design before I start making API breaking changes to the interface. proto2 is not good for gRPC and there are lots of missing things like being able to gracefully close a streaming filter. I also think I need to change some of the signals so that they don't pass raw pointers -- after network time, there is no way to know if they are valid or not.

Also of note:

~/D/murmur-static_x86-1.3.0 $ file murmur.x86 && ls -lh murmur.x86
murmur.x86: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
-rwxr-xr-x 1 adenosine adenosine 34M Aug 20  2019 murmur.x86*
[root@let ~]# file /usr/bin/murmurd && ls -lh /usr/bin/murmurd
/usr/bin/murmurd: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), statically linked, stripped
-rwxr-xr-x 1 root root 13M Feb 20 10:23 /usr/bin/murmurd

No dbus or ICE, but much smaller, even with tmalloc linked in with CONFIG_OPTS="c++17 tcmalloc static no-client no-ice no-bonjour grpc no-dbus"

@McKayJT McKayJT force-pushed the grpc-rework-cleanup branch from d7a1ca2 to d75ff45 Compare February 20, 2020 15:26
@Krzmbrzl Krzmbrzl mentioned this pull request Mar 3, 2020
@Krzmbrzl
Copy link
Member

@McKayJT is this PR still WIP or is it done by now?
Btw. I do plan to review (or at least start reviewing) this PR in the next couple of days. Huge sorry for the immense delay that it took me to do it 🙈

@Krzmbrzl Krzmbrzl force-pushed the grpc-rework-cleanup branch from 8919e02 to 0b63605 Compare March 30, 2020 09:36
@Krzmbrzl
Copy link
Member

Krzmbrzl commented Mar 30, 2020

To everyone that wants to build this: Note that you'll have to have the following components installed:

  • libprotoc-dev
  • libgrpc-dev
  • protobuf-compiler-grpc

@McKayJT it would be great if you/we could add checks to the build env that detect if these components are missing and give a proper error message instead of letting the compilation fail with some more or less cryptic error messages.

src/Utils.h Outdated
#ifdef USE_GRPC
# include <QDebug>
# include <string>
# include <grpcpp/support/string_ref.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my system this has to be

# include <grpc++/support/string_ref.h>

instead. Did this name change with some version of grpc?

Copy link
Member

@Krzmbrzl Krzmbrzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to get this to compile.

First I had to get some dependencies that the build-system didn't check for (so I had to deduce from cryptic error messages what might be missing).
Then I had to build and install Boost myself as the version packaged with my Distro was too old (and incomplete for that matter). This wasn't checked by the build-system either.

As it is, I am unable to build and thus unable to review. Thus for now I'd ask you to try and add some checking of required libraries and their versions during initial setup.

src/Utils.h Show resolved Hide resolved
@@ -0,0 +1,1248 @@
// Copyright 2005-2019 The Mumble Developers. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filename is missing a C

/// implementation reads in the message, checks validity, then adds a `std::weak_ptr` along
/// with other details to a container of listeners.
///
template<typename X = RPCType, boost::enable_if_t<std::is_same<X, Unary_t>::value>* = nullptr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boost::enable_if_t is a somewhat recent addition to Boost. Recent enough to not be part of the version packaged in the Ubuntu repos (at least not all of them).

Thus you should check which version of Boost or in this case which version of the type_traits module is installed and issue an error if it is not sufficient.

src/murmur/GRPCContainer.h Show resolved Hide resolved
src/murmur/GRPCContainer.h Show resolved Hide resolved
}
using MurmurRPC::Wrapper::RPCCall;
namespace mwc = MurmurRPC::Wrapper::Container;
namespace mw = MurmurRPC::Wrapper;

class MurmurRPCAuthenticator : public ::grpc_impl::AuthMetadataProcessor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine grpc_impl:: does not exist. I only have the namespace grpc::

@McKayJT
Copy link
Contributor Author

McKayJT commented May 16, 2020

Sorry about the delay in response; I got busy as well.

I am going to install a container with Ubuntu bionic so I can test against the old versions of gRPC and boost. As to package checks, is it worth doing them in the current build system? It seems like once the cmake build system is working that it would be far easier to check to make sure that the correct versions are in place.

I've pushed some fixes for older versions of gRPC, boost, and Qt. I've included mp11 and callable_traits as submodules (they work standalone without the rest of boost). I rely heavily on mp11 and callable_traits and there is no good workaround except just including them as a submodule. You also need to grab the libboost-fiber-dev and libboost-context-dev packages for the fiber libraries. There are some nasty visibility warnings when I build this in an Ubuntu bionic container, but it does build.

McKayJT added 9 commits May 16, 2020 21:07
Old versions of boost do not have these, but it should work as a
standalone module.
for older boost compat
QtRandomGenerator not in old version of Qt
didn't include this when I changed trees
add boost::enable_if_t if not included, based on std::enable_if
define BOOST_ATTRIBUTE_NODISCARD to be empty if undefined
use QObject:connect instead of callOnTimeOut convencience method
introduced in qt 5.10
builds without pch fail with this not defined
@toby63 toby63 mentioned this pull request May 17, 2020
4 tasks
@Krzmbrzl
Copy link
Member

Sorry about the delay in response; I got busy as well.

No problem - I had enough stuff to do in the meantime to not get bored :D

It seems like once the cmake build system is working that it would be far easier to check to make sure that the correct versions are in place.

Yeah this is probably true. Especially since the PR for the cmake port should be opened in the coming days now...

Thanks for the fixup - I'll have to try and compile it again in the coming days ☝️

@Krzmbrzl
Copy link
Member

Okay I just tried compiling and it worked flawlessly 👍

What would you say is the easiest way of testing the gRPC interface in action? Is there some existing tool or something that I can use to (indirectly) verify stuff seems to be working properly? :)

@streaps
Copy link

streaps commented May 25, 2020

Is the API still the same?

There is murmur-cli
https://github.com/layeh/murmur-cli

and a fork of it
https://github.com/McKayJT/murmur-cli
(link to binaries is in the README)

@McKayJT
Copy link
Contributor Author

McKayJT commented May 25, 2020

Yeah i use murmur-cli for testing. It doesn't have support for the AuthenticatorStream endpoint yet, so that feature remains somewhat untested. I use TextMessageFilter for stream-stream testing, combined with true and false instead of parsing the requests.

@Krzmbrzl
Copy link
Member

Ha. Once I actually enabled grpc in the init file, I get a segmentation fault when starting up murmur:

Thread 1 "murmurd" received signal SIGSEGV, Segmentation fault.
0x00007ffff6416477 in grpc_pollset_size () from /usr/lib/libgrpc++.so.1
(gdb) bt
#0  0x00007ffff6416477 in grpc_pollset_size () from /usr/lib/libgrpc++.so.1
#1  0x00007ffff642d7de in grpc_completion_queue_create () from /usr/lib/libgrpc++.so.1
#2  0x00007ffff63f667d in grpc::ServerBuilder::AddCompletionQueue(bool) () from /usr/lib/libgrpc++.so.1
#3  0x000055555572dbf0 in MurmurRPCImpl::MurmurRPCImpl (this=0x555555ee2d70, address=..., credentials=std::shared_ptr<grpc::ServerCredentials> (empty) = {...}) at MurmurGRPCImpl.cpp:306
#4  0x000055555572cc0d in GRPCStart () at MurmurGRPCImpl.cpp:219
#5  0x0000555555611bd8 in main (argc=1, argv=0x7fffffffdd08) at main.cpp:589

This is my ini file:

loggroupchanges=true
logaclchanges=true

listenersperchannel=1
listenersperuser=2

autobanAttempts=0

ice="tcp -h 127.0.0.1 -p 6502"

; If you want to expose Murmur's experimental gRPC API, you
; need to specify an address to bind on.
; Note: not all builds of Murmur support gRPC. If gRPC is not
; available, Murmur will warn you in its log output.
grpc="127.0.0.1:50051"

@logicethos
Copy link

Any news on this?

@Krzmbrzl
Copy link
Member

No - it seems that @McKayJT did not have the time to continue working on this PR yet ☝️

@Kissaki Kissaki marked this pull request as draft December 24, 2020 09:38
@Krzmbrzl
Copy link
Member

Closing this as discontinued for now

@Krzmbrzl Krzmbrzl closed this Jun 28, 2021
Krzmbrzl added a commit to Krzmbrzl/mumble that referenced this pull request Feb 6, 2022
The gRPC implementation never left the experimental state and never
reached a properly stable state to the point where we would feel good
about enabling it by default. In addition to that, there has been no
further attempts at finding and fixing the encountered issues in the
implementation (except mumble-voip#3947 but that was discontinued).

As such we had an essentially unmaintained piece of code in our server
implementation that was known to be buggy and that nobody wanted to fix.
In addition to that the implementation itself could not be considered
very clean or elegant and therefore only represented a few smelly
corners in our code base.

For this reason, we decided to remove the gRPC support entirely from
Mumble (for now).

What we hope to gain by that is:
- Prevent people from building unstable server versions and then coming
to us complaining that it crashed/misbehaved
- Removing (essentially) dead code
- Reduce the RPC implementation complexity

That last piece is crucial: By removing gRPC support we reduce the
amount of supported RPC frameworks to only one (ignoring DBus for now).
Our future plans include a refactoring of how RPC is being handled and
implemented and only having to worry about maintaining compatibility
with one RPC system is much easier than having to worry about two (with
(slightly) different APIs).
Once the RPC implementation has been rewritten, more RPC backends may be
reintroduced and in that process we might investigate adding a proper
gRPC implementation to the code (that then hopefully is more stable than
the current one).

Fixes mumble-voip#4567
Fixes mumble-voip#4197
Fixes mumble-voip#3496
Fixes mumble-voip#3429
Fixes mumble-voip#3265
Krzmbrzl added a commit to Krzmbrzl/mumble that referenced this pull request Mar 16, 2022
The gRPC implementation never left the experimental state and never
reached a properly stable state to the point where we would feel good
about enabling it by default. In addition to that, there has been no
further attempts at finding and fixing the encountered issues in the
implementation (except mumble-voip#3947 but that was discontinued).

As such we had an essentially unmaintained piece of code in our server
implementation that was known to be buggy and that nobody wanted to fix.
In addition to that the implementation itself could not be considered
very clean or elegant and therefore only represented a few smelly
corners in our code base.

For this reason, we decided to remove the gRPC support entirely from
Mumble (for now).

What we hope to gain by that is:
- Prevent people from building unstable server versions and then coming
to us complaining that it crashed/misbehaved
- Removing (essentially) dead code
- Reduce the RPC implementation complexity

That last piece is crucial: By removing gRPC support we reduce the
amount of supported RPC frameworks to only one (ignoring DBus for now).
Our future plans include a refactoring of how RPC is being handled and
implemented and only having to worry about maintaining compatibility
with one RPC system is much easier than having to worry about two (with
(slightly) different APIs).
Once the RPC implementation has been rewritten, more RPC backends may be
reintroduced and in that process we might investigate adding a proper
gRPC implementation to the code (that then hopefully is more stable than
the current one).

Fixes mumble-voip#4567
Fixes mumble-voip#4197
Fixes mumble-voip#3496
Fixes mumble-voip#3429
Fixes mumble-voip#3265
Krzmbrzl added a commit that referenced this pull request Mar 16, 2022
The gRPC implementation never left the experimental state and never
reached a properly stable state to the point where we would feel good
about enabling it by default. In addition to that, there has been no
further attempts at finding and fixing the encountered issues in the
implementation (except #3947 but that was discontinued).

As such we had an essentially unmaintained piece of code in our server
implementation that was known to be buggy and that nobody wanted to fix.
In addition to that the implementation itself could not be considered
very clean or elegant and therefore only represented a few smelly
corners in our code base.

For this reason, we decided to remove the gRPC support entirely from
Mumble (for now).

What we hope to gain by that is:

    Prevent people from building unstable server versions and then coming
    to us complaining that it crashed/misbehaved
    Removing (essentially) dead code
    Reduce the RPC implementation complexity

That last piece is crucial: By removing gRPC support we reduce the
amount of supported RPC frameworks to only one (ignoring DBus for now).
Our future plans include a refactoring of how RPC is being handled and
implemented and only having to worry about maintaining compatibility
with one RPC system is much easier than having to worry about two (with
(slightly) different APIs).
Once the RPC implementation has been rewritten, more RPC backends may be
reintroduced and in that process we might investigate adding a proper
gRPC implementation to the code (that then hopefully is more stable than
the current one).

Fixes #4567
Fixes #4197
Fixes #3496
Fixes #3429
Fixes #3265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some gRPC calls can crash the server
5 participants