-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve merging of docs on generation #88514
Conversation
Ran some benchmarking with the following: diff --git a/main/main.cpp b/main/main.cpp
index 4c9c159f47..f87e02eb44 100644
--- a/main/main.cpp
+++ b/main/main.cpp
@@ -3257,10 +3257,18 @@ bool Main::start() {
GLOBAL_DEF(PropertyInfo(Variant::INT, "dotnet/project/assembly_reload_attempts", PROPERTY_HINT_RANGE, "1,16,1,or_greater"), 3);
#endif
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Total");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Generation");
+
Error err;
DocTools doc;
doc.generate(gen_flags);
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Generation");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Loading");
+
DocTools docsrc;
HashMap<String, String> doc_data_classes;
HashSet<String> checked_paths;
@@ -3299,24 +3307,44 @@ bool Main::start() {
ERR_FAIL_COND_V_MSG(err != OK, false, "Error loading classes from: " + index_path + ": " + itos(err));
checked_paths.insert(index_path);
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Loading");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Merging");
+
print_line("Merging docs...");
doc.merge_from(docsrc);
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Merging");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Erasing");
+
for (const String &E : checked_paths) {
print_line("Erasing old docs at: " + E);
err = DocTools::erase_classes(E);
ERR_FAIL_COND_V_MSG(err != OK, false, "Error erasing old docs at: " + E + ": " + itos(err));
}
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Erasing");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Saving");
+
print_line("Generating new docs...");
err = doc.save_classes(index_path, doc_data_classes);
ERR_FAIL_COND_V_MSG(err != OK, false, "Error saving new docs:" + itos(err));
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Saving");
+
+ OS::get_singleton()->benchmark_begin_measure("Documentation", "Deleting");
+
print_line("Deleting docs cache...");
if (FileAccess::exists(EditorHelp::get_cache_full_path())) {
DirAccess::remove_file_or_error(EditorHelp::get_cache_full_path());
}
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Deleting");
+
+ OS::get_singleton()->benchmark_end_measure("Documentation", "Total");
+
OS::get_singleton()->set_exit_code(EXIT_SUCCESS);
return false;
} And for me it gave the following two cases: OUTDATED
Which is marginal but is present, but the generation isn't slowed down by the additional sorting, so I'd say this is an improvement in any case just cleaner code The constructor and operator part is minimal in the total performance, so can restore that code if desired, but I'd say it is useful in either case |
Great work! You love to see PRs with more lines removed than added that also speed performance :) I'll go through and review your changes in more detail when I get the chance. |
Since the merging logic is non-trivial now, I might suggesting making a helper function or iterator that you can reuse for the different merge cases. For example, a If you feel that overcomplicates things, I think repeating the iteration code is fine as well. |
I'd say first group them into their own functions (as per above comment). Then, the unification would come natural at a later point, as it would be easier to envision and refactor with "all of the cards on the table". |
The comparison isn't the same in the different cases, so it'd be more complicated Also realized I need to fix some details of this, one moment, the operators and constructors need their own comparison, will fix that Edit: There, cleaned up and created dedicated methods for each case, realized I compared constructors and operators just on name which was incorrect, now it works properly for merging Edit 2: Need to fix some details, realized that the doc files aren't guaranteed to be sorted, so need to handle that The constants will have to be removed from this as they are not sorted in the docs, will fix that |
ca9efd4
to
3cd7c74
Compare
Fixed some of the code, getting late so will continue working tomorrow, should work now, but with some new changes, will elaborate on them tomorrow Will update the benchmarks tomorrow |
3cd7c74
to
3364f83
Compare
Wow, the new benchmarks show a bigger improvement and it seems like most of it is due to the |
There's going to be fluctuations especially with file access, will do some more rigorous benchmarking tomorrow, and some more tweaks to the ordering of constructors, it needs a little more fixing but it's almost ready again Writing down so I don't forget it, will test not sorting the loaded docs tomorrow and instead searching in the generated docs which is already sorted, instead of doing a sort + merge, will see |
So preliminary update (will work later today), will drop the operators and constructors for now, they're not difficult to fix as such but they require some discussion on how to handle the ordering etc. that I think it best left to a follow up PR, and they make up a vanishingly small portion of the documentation data (only built-in types have them, and only a few each) Will push later today and do new benchmarking |
17f70fb
to
3104f96
Compare
Some different changes, will elaborate more later, but here are some basic benchmarking for comparison, the difference isn't major, but it's something, and I think the general code improvements are good in themselves as well
Would be interesting to see some other benchmarking as it's likely specific Will do some further testing, but the current method uses a lookup on one side of the data, rather than a merge, to find valid cases, this avoids sorting the loaded data unnecessarily The performance improvements might be greater in the total case with sorting still, as the loaded docs is expected to be largely sorted, but there's tradeoffs on either side, will look further later and see what method is the best in totality, but going with this method at the moment |
3104f96
to
79962b8
Compare
So some observations and details:
In the end the improvements for performance seem negligible, though it might be more on some hardware, hard to say, but I think the code style improvements are significant here so IMO this change isn't without merit Some changes I did:
Ready for review now, might need some minor tinkering, but works well now, tested messing with documentation files and it doesn't break when you do things like reordering or deleting entries, the checks are all sufficient Discovered there was a sneaky sort in the save operation, removing this redundancy and adding sorting elsewhere instead |
@@ -323,6 +357,8 @@ void DocTools::merge_from(const DocTools &p_data) { | |||
for (int j = 0; j < cf.properties.size(); j++) { | |||
if (cf.properties[j].name == "GodotSharp") { | |||
c.properties.push_back(cf.properties[j]); | |||
c.properties.sort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed after removing sorting on saving, could use an insert as well, will see later
@@ -1696,8 +1744,6 @@ Error DocTools::save_classes(const String &p_default_path, const HashMap<String, | |||
_write_method_doc(f, "annotation", c.annotations); | |||
|
|||
if (!c.theme_properties.is_empty()) { | |||
c.theme_properties.sort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the sorts here as they were mostly unnecessary and done after already sorting, and moved that sorting into the generation where it does not already apply
3bd9c3a
to
d726213
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sweet overall! Some great quality of life improvements for this code I'd say :)
Does Godot have a type to represent sorted arrays? It might be nice so it's easier to keep track of assumptions about whether or not an array has already been sorted or not.
If it doesn't, would it be worth putting in some assert
statements to ensure the array is sorted when it is expected to be, at least in debug builds?
merge_constructors(c.constructors, cf.constructors); | ||
|
||
merge_methods(c.methods, cf.methods); | ||
|
||
merge_methods(c.signals, cf.signals); | ||
|
||
merge_constants(c.constants, cf.constants); | ||
|
||
merge_methods(c.annotations, cf.annotations); | ||
|
||
merge_properties(c.properties, cf.properties); | ||
|
||
merge_theme_properties(c.theme_properties, cf.theme_properties); | ||
|
||
merge_operators(c.operators, cf.operators); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now that's some clean lookin' code :)
break; | ||
} | ||
// Check found entry on name. | ||
if (to.name == from.name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check redundant with the searching above? Wouldn't the searcher only find a match if their names were equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a binary search, so not guaranteed, it just ensures from[found]
is no greater than to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we might want to document that somewhere. It's not directly related to this issue, but I looked briefly at the code for SearchArray
and it wasn't immediately apparent that would be the behavior in the case the query isn't found. I'm sure it's a standard thing for those who know, but for those who don't it's kind of surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted SearchArray
is a bit more obscure, but it's very internal and the only reason I use it directly here is because Vector::bsearch_custom
is non-const and does mutation checks :)
But might be worth documenting that class more directly, will see about that later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my defense, it wasn't obvious for me to check the docs for the exposed Array
binary search to learn how the unexposed SearchArray
binary search works. That said, you're totally right :)
Edit: saw just now you pointed out the same thing xD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add notes on those lines to Vector::bsearch
and the bisect
method after this :)
d726213
to
3467e84
Compare
There isn't a dedicated type for that no, and we'd want to not copy memory around so it wouldn't really be helpful IMO, we are operating on the actual doc data which uses |
Realized it doesn't help to sort the loaded data for merge with the general purpose sort algorithm we use, it being sorted or mostly sorted doesn't change anything most likely, haven't studied the algorithm but it's likely a general purpose one with best/worst/average all being |
3467e84
to
02a5d6c
Compare
Will go over this one a final time this week and squash |
It's been a few weeks since this was made, what's the current status and TL;DR on actual improvements? IIUC the performance gain may not be that significantly, but is the code quality significantly better? If so, that should be fine. Are the commits meant to be squashed? |
I'd say that without a more dynamic sorting algorithm (which we might want to add in the future) the performance improvements are likely marginal, but I think the code quality is greatly improved, will go back over it and re familiarise myself with what this does specifically as it's been quite a while since I touched it There are some minor improvements like swapping some orders of comparisons for faster paths which is nice |
Adds dedicated helper methods to perform sorting to clean up code, and uses linear merging instead of iterating over both lists directly
02a5d6c
to
8679ee1
Compare
The performance remains very much unchanged, it only improves the merge step (about halves the time) but it is so minor that it doesn't matter, but the code is now easier to read in my opinion, future improvements could help with some of these parts I'm sure One question: |
Your call, if you expect they'll be useful for further work in the future, they could be added behind a local define. |
I'll add them if I end up working more on it in the future, for now I think this part isn't something that'll be critical to get good benchmarks on |
Thanks! |
Thank you! |
Cleans up and improves (slightly, it seems) performance in merging docs by using sorted lookup