-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reorder ml-matches to avoid catastrophic performance case #49664
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vtjnash
force-pushed
the
jn/ml-matches-rewritten
branch
2 times, most recently
from
May 10, 2023 16:30
35b3856
to
8f913e8
Compare
vtjnash
commented
May 11, 2023
vtjnash
force-pushed
the
jn/ml-matches-rewritten
branch
2 times, most recently
from
May 12, 2023 15:18
81c3fad
to
2e90105
Compare
This ordering of the algorithm abandons the elegant insertion in favor of using another copy of Tarjan's SCC code. This enables us to abort the algorithm in O(k*n) time, instead of always running full O(n*n) time, where k is `min(lim,n)`. For example, to sort 1338 methods: Before: julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter()); 0.136609 seconds (22.74 k allocations: 1.104 MiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter()); 0.046280 seconds (9.95 k allocations: 497.453 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.132588 seconds (22.73 k allocations: 1.103 MiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.135912 seconds (22.73 k allocations: 1.103 MiB) After: julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter()); 0.001040 seconds (1.47 k allocations: 88.375 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter()); 0.039167 seconds (8.24 k allocations: 423.984 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.081354 seconds (8.26 k allocations: 424.734 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.080849 seconds (8.26 k allocations: 424.734 KiB) And makes inference faster in rare cases (this particular example came up because the expression below occurs appears in `@test` macroexpansion), both before loading loading more packages, such as OmniPackage, and afterwards, where the cost is almost unchanged afterwards, versus increasing about 50x. julia> f() = x(args...; kwargs...); @time @code_typed optimize=false f(); 0.143523 seconds (23.25 k allocations: 1.128 MiB, 99.96% compilation time) # before 0.001172 seconds (1.86 k allocations: 108.656 KiB, 97.71% compilation time) # after
This now chooses the optimal SCC set based on the size of lim, which ensures we can assume this algorithm is now << O(n^2) in all reasonable cases, even though the algorithm we are using is O(n + e), where e may require up to n^2 work to compute in the worst case, but should require only about n*min(lim, log(n)) work in the expected average case. This also further pre-optimizes quick work (checking for existing coverage) and delays unnecessary work (computing for *ambig return).
vtjnash
force-pushed
the
jn/ml-matches-rewritten
branch
from
May 12, 2023 19:06
2e90105
to
ac1cb1c
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
As expected, this should be a big win for inference |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This ordering of the algorithm abandons the elegant insertion in favor of using another copy of Tarjan's SCC code. This enables us to abort the algorithm in O(k*n) time, instead of always running full O(n*n) time, where k is
min(lim,n)
.For example, to sort 1338 methods:
And makes inference faster in rare cases (this particular example came up because the expression below occurs appears in
@test
macroexpansion), both before loading loading more packages, such as OmniPackage, and afterwards, where the cost is almost unchanged afterwards, versus increasing about 50x.