-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add profile-guided optimization (PGO) support for engine and for projects executable that use C++ libraries #2610
Comments
Profile-guided optimization, as the name implies, depends on the workload being executed. If the workload matches the profile has been trained, you can get a significant performance boost. However, if the workload doesn't match the profile that has been trained, performance won't improve at all and may even be slightly degraded. Not to mention that producing PGO-optimized binaries requires compiling twice, which would slow down the official release process significantly. (Remember that dozens of Godot binaries need to be compiled each time an official release is made.) While it's true that official releases of web browsers make use of PGO, applying this effectively to a game engine which is used in varied ways sounds difficult. As for libraries, I don't think it's feasible to implement this for GDNative. However, it's certainly feasible for statically-compiled C++ modules since these are technically part of the engine source code (and application binary). |
well you know best; even that i wold love to see such an implementation |
so isn't there is a way to make a project that uses every functunality in godot as a traning module for the PGO |
As someone who's interested in performance boosts with Godot, I could possibly work on this if the proposal is approved.
It depends I guess. If this can help to optimize the GDScript VM further, then it may not be necessarily project-specific anymore. |
MR. Xrayez thank you; so if i compiled the godot engine with PGO flags wold my exexutable will be also using PGO or not; |
and also how stable could it be compiling godot engine branch 3.2 and using it |
@Abdelilah-Majid I had mostly no issues compiling the stable branch on my host OS for development purposes. But the build complexity depends on how many platforms you're going to target for your project. I'm personally using https://github.com/godotengine/build-containers which makes it easy to compile for all supported platforms once you set the entire thing up. I personally haven't used PGO myself to be honest... But yeah, you'd have to compile both editor and export templates with PGO if you want the performance boost in both I guess, there would be certainly some differences between debug vs release on byte code level, especially for GDScript. |
thanks MR. Xrayez |
MR. Xrayez i think that you build godot in your os host which i think is linux |
and for export templates is there is a repo for them that i can clode and build using PGO or there is no |
Yes. 🙂
Export templates is nothing more than a Godot build without editor |
thanks MR. Xrayez |
okay so my internet connection is very slow now |
from what you said MR. Xrayez and from what i guess when the game is exported the project template is acting like a VM for the project code and scenes the game will run like if you press the lunch button in the godot editor but without the editor itself, that explain why the game.x86_x64 has a big size even if its empty |
Yes, you can even place export templates (executable) into the project's source code directly (where |
ah, okay thanks |
i actually have a question |
I suggest you going through https://docs.godotengine.org/en/latest/development/compiling/index.html, but yes you'd have to compile for each platform... For instance, look at https://github.com/godotengine/godot-build-scripts/blob/master/build-linux/build.sh which compiles both editor and export templates for Linux in official build scripts. All export templates are then packaged with https://github.com/godotengine/godot-build-scripts/blob/master/build-release.sh. But again, that's how it's done officially, I'm linking those as a reference, those scripts are not really usable by themselves. But that's not really important or needed for this proposal specifically, so lets keep the discussion on topic, if you have more questions you can ask them at community channels. 🙂 |
okay thanks MR. Xrayez |
OK so i have done testing and the results where shocking and will be a game changer so i will write here the steps i use to get to this point for anyone to follow along: so here is the repo for the testing: https://github.com/Abdelilah-Majid/godot-PGO_test so there is 3 tests; one for the cpu and one for the gpu and one for cpu & gpu so i have clone the godot engine repo and i checkout to 3.2 branch
then i compile the godot project templates using this command:
NOTE: i didnt use 'use_lto' because of the limitations of my laptop(i only have 6GB of ram) so there will be a performance gap between running the game on the official godot engine and on the project template that is using PGO so i will keep this in mind while i am doing my calculations i have created a GLES 2.0 project because my laptop doent support GLES 3.0 and abouve after compiling the project templates i train it whith my test project and after that i have done a sort on the .GCDA file to see which this sort is done using: NOTE: the size here is in KB
as you can see in the .gcda files there are more big .gcda files that are related to GLES 3.0 than GLES 2.0 so if the project was using GLES 3.0 the GPU performance could be better then i have changed the commands i added before to SConstruct file to:
i added the '-fprofile-correction' because of the use of multithreading in the godot engine then i have recompile the godot project templates and i run the same test project on the new project templates that uses PGO and here is the results:
dear godot core devs if you think that thees numbers are a waste of your time; i think that i am speaking on behalf of my self and the godot community when i am saying that we love to see godot stand out from the crowd in everything especialy performance and we hope that godot wont turn into another bloated unity and that we wold realy love to see that godot is the first game engine that uses the PGO technology; #peace; |
ok so i re run the last test (cpu and gpu test) before there was 1000 spritest now there is 10000 and here is the results: cpu_and_gpu_test_without_pgo: 3 fps cpu_and_gpu_test_with_PGO_enabled: 7 fps cpu_and_gpu_test_without_pgo % cpu_and_gpu_test_with_PGO_enabled = 133% i think that the time before this when i use PGO the bottleneck was the gpu so now i think that this is more precise number i guess |
The results look interesting, but as Calinou said, the performance boost achieved might be only applicable to the workload used. There need to be several largely different test projects which do substantially different things, both on CPU and GPU levels. If all those projects gain significant performance in different domains and the performance doesn't degrade in other cases, then we'll solve the first equation. Looking at your CPU test projects, the only thing they do is computing the For GPU, yeah perhaps the performance can be achieved in a more general-purpose way, but it may just depend on the specific hardware/drivers used, so on other machines it may perform worser. For CPU, as I said earlier, I think that PGO could be applied to optimize the GDScript VM, that means most common GDScript control paths need to be trained to be able to benefit from this kind of optimization in most common use cases. Once all the above concerns are resolved, then the next task is to set up the official build toolchain to do this in an automated manner. I personally see this task quite insurmountable at the moment. First, the buildsystem would have to spend twice the time to compile all export templates, and it would have to run sample projects for each binary to train. It's not always possible to do in automated manner from a single host OS which just cross-compiles to other platforms. Those sample projects would also have to be maintained to ensure that they do work properly and don't ever regress during development. Speaking about myself as a user, it currently takes me 12-24 hours to compile Godot for all platforms using the official build scripts with LTO. It means that it would take me more than 48 hours to compile with PGO, unlike official builds which only take like 4 hours on a powerful machine. 🥉 That said, this kind of optimization will be certainly useful for all who are interested in this technology for their (specific) projects, that's why I'm suggesting that at the very least, Godot should provide SCons build options related to PGO, see my previous comment: #2610 (comment). But I'm not denying the possibility to use PGO for official builds as well, but there should be good proofs that PGO can be useful for most use cases, and won't make it worse for other use cases to be adopted. |
i have an idea, why instead of going the hard way with general optimization why dont we add an official project template that use '-fprofile-generate' along side with the one that doesnt make use of PGO and letting the game devs have a simple way of downloading the godot source code and compiling the projects template with '-fprofile-use' '-fprofile-correction' in the godot editor and this way they can make use of PGO for there needs and this way you dont have to train anything yourself and the PGO will be there for people that need some specific optimization |
i have re run the smae cpu and gpu test this time with 2000 objects cpu_and_gpu_test_without_pgo: 15 fps cpu_and_gpu_test_with_PGO_enabled: 35 fps cpu_and_gpu_test_without_pgo % cpu_and_gpu_test_with_PGO_enabled = 133.33% |
MR. Xrayez i am so sorry for saying this but i dont think that you get the point so imagine this: and we have here SuperUserMax he know a lot about hardware and technologie he even use linux so SimpleJony saw SuperUserMax's game and he think oh this game is very butiful and i think that my 3070 GPU can handl it and here is where the PGO hero comes in so as we saw in my calculations PGO can increase performance in thees specific cases by 133% and for SimpleJony thats a lot of FPS; end so did you get the idea its not about what this can do for the average user and dont get me wrong it can realy decrease CPU load a lot; |
so if you think that traning project template will take a lote of time just dont do it with that said i dont know if you can use -fprofile-use with files that aren't trained and then train these file and then use project_template_PGO_use_files on them and get some performance i never done such a thing and i dont know if it will work |
I think you miss an important point: SimpleJony and SuperUserMax will both use PGO if this proposal is implemented. This way, SuperUserMax will take advantage of additional performance gains and push the performance to the max again, while SimpleJony won't be able to keep up in either case. I don't think this problem will ever go away unless SuperUserMax stops being so ambitious and demanding. In fact, SuperUserMax would likely be the one who'd use PGO in the first place. 😛 And your case may be totally different. I don't know exactly what you want to achieve with this in your own project or use case as a developer. If you're already using C++ to develop a game, then this should likely solve 95% of the performance problems (especially when you just want to switch from slow GDScript). Unless you're specifically targeting really low-end hardware/market and audience which cannot afford high-end technologies. This is where I can understand the problem.
That's what general-purpose software has to do. I mean, it's not necessarily "average Joe" problem, but how many people stumble upon a similar problem to justify addition to the engine. Again, that's only my opinion, so far I'm the only one who actively participates in the discussion with you. But I'm the one who's also interested in performance gains with Godot. But even then, I haven't really needed something more from C++ development in Godot. Just being able to use C++ over GDScript for performance-critical tasks resolve quite a lot of limitations already. It might be actually the algorithms and data structures that you use which can significantly improve the performance even without resorting to technologies like LTO/PGO. Yet again, having additional performance gains would be certainly nice (that's why official builds use LTO now), but we also have to think in terms of how this will affect daily Godot development and maintenance. Also, Godot does not really prefer performance for development anyways, but more like usability. The fact that Godot uses a tree architecture for everything already creates some performance penalties in contrast with ECS and whatnot. |
well i agree with you |
and for
i am just like Linus Torvalds i love optimizations for the sake of optimizations |
as for my game it has a big grid of a map and a lot of characters that need to spot the position of the the place they want to go to and they need to calculate what is the best road to go to while avoiding objects that are in the way so they need to loop throw the std::vector floor_grid again and again to find the best way to go to the place they programmed to go to for each character; and i think that this will need a lot of cpu power |
also i forgot to say something which has to do with template files size i think that this is becouse of the PGO optimizations which make the programes that use .GCDA files lighter in size and this also could be usful to reduce the godot project template size by a lote |
I suggest @Abdelilah-Majid you don't attach yourself too tightly to your preliminary results. As mentioned by both Calinou and Xrayez, there is likely no universal way to optimize everyone's performance with PGO, or with any other means for that matter. You created an arbitrary project that was successfully optimized, but it's far from a complete game, and it doesn't even do things that most games do in isolation. So while impressive, those results are for the most part irrelevant. To get some real use out of PGO every developer would have to generate their own profile on the per project basis. Some general optimizations in the engine can be possible, such as the ones mentioned by Xrayez, but those would require carefully designed tests to evaluate them. |
okay |
but if the godot devs didnt implement PGO in the godot engine i hope that at least they add some commands in the scons file for optionally use PGO |
if any one is interested here is the project that i use for testing PGO on godot: https://github.com/Abdelilah-Majid/godot-PGO_test note i wont do testing for you tests becouse my laptop is too weak to compile godot engine(it takes a long time on my 2 core 2 tread cpu) |
I want to add more materials about PGO for possible future developments in this area. Regarding gamedev domain, I know the following results about PGO:
More results about PGO for other pieces of software, including some low-level libraries like |
Describe the project you are working on
a game using c++
Describe the problem or limitation you are having in your project
i need more performance from my c++ game
Describe the feature / enhancement and how it helps to overcome the problem or limitation
implementing Profile-Guided Optimization (PGO) in the godot engine for more performance in the range of (15% to 20%) and in the exported game executable to use it with c++ libraries that also uses Profile-Guided Optimization (PGO) NOTE I CANT USE LIBRARIES WITH PGO UNLESS THE EXECUTABLE IS ALSO USING PGO;
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
//quoted from stackoverflow link: https://stackoverflow.com/questions/14492436/g-optimization-beyond-o3-ofast ===========
PGO
GCC has Profile-Guided Optimisations features. There isn't a lot of precise GCC documentation about it, but nevertheless getting it to run is quite straightforward.
first compile your program with -fprofile-generate.
let the program run (the execution time will be significantly slower as the code is also generating profile information into .gcda files).
recompile the program with -fprofile-use. If your application is multi-threaded also add the -fprofile-correction flag.
PGO with GCC can give amazing results and really significantly boost performance (I've seen a 15-20% speed increase on one of the projects I was recently working on). Obviously the issue here is to have some data that is sufficiently representative of your application's execution, which is not always available or easy to obtain.
//quoted from stackoverflow link: https://stackoverflow.com/questions/14492436/g-optimization-beyond-o3-ofast ===========
and implement the PGO in the exported game executables whether it uses gdscript or c++ but i will prefer c++
If this enhancement will not be used often, can it be worked around with a few lines of script?
it is demonstrated above
Is there a reason why this should be core and not an add-on in the asset library?
the only reason why this shold be in the core is the massive performance gain between (15-20%) more performance
The text was updated successfully, but these errors were encountered: