-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting the generated C++ Code into multiple files to improve compile-time #2215
Splitting the generated C++ Code into multiple files to improve compile-time #2215
Conversation
d4bf910
to
c2a081b
Compare
Codecov Report
@@ Coverage Diff @@
## master #2215 +/- ##
==========================================
+ Coverage 76.91% 77.25% +0.33%
==========================================
Files 455 458 +3
Lines 28652 29153 +501
==========================================
+ Hits 22038 22521 +483
- Misses 6614 6632 +18
|
That is great work. I just wonder about the gains w.r.t. compile-speed for small changes in programs. We have typedefs for data-structures; we have strata with relations as member variables; we have rules that use either relations from the own strata or relations from already computed strata. If I understand your WIP correctly, the gain is to compile the strata whose rules have changed and not the whole program. Can we do another PR for the new functor interface? That is better in terms of SWENG. |
Yes, I will discard the changes related to the functors interface and do a dedicated PR later. Isolating each stratum in its own c++/header file is a good first step to address the above 3 points, because it ensures that any stratum that is not affected by some datalog changes will produce the exact same c++ code.
To make sure the generated C++ code of an un-modified stratum does not change, I had to get rid of some non-determinism at some places in Soufflé in particular:
|
Is it possible to keep the old functionality (source code in a single file) and to have the new one (splitting the C++ for each stratum) as well? Or is this too much of an effort? |
Yes, the idea is to have two modes: a single-file output and a multiple-files output. The single-file output should remain the default, while the other could be for more advanced users ? |
Is it done? |
Sorry, not yet.... I still need to clean up a bit the code generation of the data-structures. |
@b-scholz I believe this pull request is now in a reasonably good state and can be reviewed |
Great job - I will have a look! |
I look at the changes now. Great work. I like the refactoring. Have you done performance testing? I also do not understand why the interpreter is affected by this change. |
The interpreter is affected because I renamed subroutines/strata to avoid using their SCC id number in their name, as the SCC id number changes too often when we modify the datalog code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
On large souffle projects, the compile time of the generated C++ can be very long. Every single small change in datalog requires recompiling the entire C++ code, which is very inefficient.
This pull request is an attempt to have a more flexible synthesiser, that allows to generate the C++ code in multiple files: the code generation decomposes the code in several classes:
The goal is to have all these classes as decorrelated as possible. In particular, we don't want to recompile a subroutine 'i' if we only changed the datalog code of subroutine 'j'. Consequently, subroutine classes are constructed by passing them a reference to the relations and user-defined functors that they need to use. A datalog code change that does not modify a given subroutine will produce the exact same C++ file for that subroutine.
Also, moving the implementation of the specialized data-structures in their own cpp files will prevent from needing to compile them each time we include their header.
When generating the code in multiple files, the first compilation takes longer than with a single file, but once this is done, and when changing only a few rules/relations of the datalog code and recompile, the recompilation is much faster than the single-file version.
To give an idea on our Souffle code :