Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reorder ml-matches to avoid catastrophic performance case #49664

Merged
merged 2 commits into from
May 15, 2023

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented May 6, 2023

This ordering of the algorithm abandons the elegant insertion in favor of using another copy of Tarjan's SCC code. This enables us to abort the algorithm in O(k*n) time, instead of always running full O(n*n) time, where k is min(lim,n).

For example, to sort 1338 methods:

Before:
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter());
  0.136609 seconds (22.74 k allocations: 1.104 MiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter());
  0.046280 seconds (9.95 k allocations: 497.453 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.132588 seconds (22.73 k allocations: 1.103 MiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.135912 seconds (22.73 k allocations: 1.103 MiB)

After:
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter());
  0.001040 seconds (1.47 k allocations: 88.375 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter());
  0.039167 seconds (8.24 k allocations: 423.984 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.081354 seconds (8.26 k allocations: 424.734 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.080849 seconds (8.26 k allocations: 424.734 KiB)

And makes inference faster in rare cases (this particular example came up because the expression below occurs appears in @test macroexpansion), both before loading loading more packages, such as OmniPackage, and afterwards, where the cost is almost unchanged afterwards, versus increasing about 50x.

julia> f() = x(args...; kwargs...); @time @code_typed optimize=false f();
  0.143523 seconds (23.25 k allocations: 1.128 MiB, 99.96% compilation time) # before
  0.001172 seconds (1.86 k allocations: 108.656 KiB, 97.71% compilation time) # after

@vtjnash vtjnash added the compiler:latency Compiler latency label May 6, 2023
@vtjnash vtjnash requested a review from JeffBezanson May 6, 2023 04:41
@vtjnash vtjnash force-pushed the jn/ml-matches-rewritten branch 2 times, most recently from 35b3856 to 8f913e8 Compare May 10, 2023 16:30
src/gf.c Outdated Show resolved Hide resolved
@vtjnash vtjnash force-pushed the jn/ml-matches-rewritten branch 2 times, most recently from 81c3fad to 2e90105 Compare May 12, 2023 15:18
vtjnash added 2 commits May 12, 2023 15:02
This ordering of the algorithm abandons the elegant insertion in favor
of using another copy of Tarjan's SCC code. This enables us to abort the
algorithm in O(k*n) time, instead of always running full O(n*n) time,
where k is `min(lim,n)`.

For example, to sort 1338 methods:

Before:
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter());
  0.136609 seconds (22.74 k allocations: 1.104 MiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter());
  0.046280 seconds (9.95 k allocations: 497.453 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.132588 seconds (22.73 k allocations: 1.103 MiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.135912 seconds (22.73 k allocations: 1.103 MiB)

After:
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter());
  0.001040 seconds (1.47 k allocations: 88.375 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter());
  0.039167 seconds (8.24 k allocations: 423.984 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.081354 seconds (8.26 k allocations: 424.734 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.080849 seconds (8.26 k allocations: 424.734 KiB)

And makes inference faster in rare cases (this particular example came
up because the expression below occurs appears in `@test`
macroexpansion), both before loading loading more packages, such as
OmniPackage, and afterwards, where the cost is almost unchanged
afterwards, versus increasing about 50x.

julia> f() = x(args...; kwargs...); @time @code_typed optimize=false f();
  0.143523 seconds (23.25 k allocations: 1.128 MiB, 99.96% compilation time) # before
  0.001172 seconds (1.86 k allocations: 108.656 KiB, 97.71% compilation time) # after
This now chooses the optimal SCC set based on the size of lim, which
ensures we can assume this algorithm is now << O(n^2) in all reasonable
cases, even though the algorithm we are using is O(n + e), where e may
require up to n^2 work to compute in the worst case, but should require
only about n*min(lim, log(n)) work in the expected average case.

This also further pre-optimizes quick work (checking for existing
coverage) and delays unnecessary work (computing for *ambig return).
@vtjnash vtjnash force-pushed the jn/ml-matches-rewritten branch from 2e90105 to ac1cb1c Compare May 12, 2023 19:06
@vtjnash
Copy link
Member Author

vtjnash commented May 12, 2023

@nanosoldier runbenchmarks(!"scalar" && !"union", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@vtjnash
Copy link
Member Author

vtjnash commented May 13, 2023

As expected, this should be a big win for inference

@vtjnash vtjnash merged commit fbbe9ed into master May 15, 2023
@vtjnash vtjnash deleted the jn/ml-matches-rewritten branch May 15, 2023 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants