Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor documentation improvements #254

Merged
merged 2 commits into from
Aug 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ For developers who can use Julia 1.6+, the recommended sequence is:
2. Record inference data with [`@snoopi_deep`](@ref). Analyze the data to:
+ adjust method specialization in your package or its dependencies
+ fix problems in type inference
+ add precompile directives
+ add `precompile` directives

Under 2, the first two sub-points can often be done at the same time; the last item is best done as a final step, because the specific
precompile directives needed depend on the state of your code, and a few fixes in specialization
Expand Down
6 changes: 3 additions & 3 deletions docs/src/pgdsgui.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ so while specialization often improves runtime performance, that has to be weigh
There are also cases in which [overspecialization can hurt both run-time and compile-time performance](https://docs.julialang.org/en/v1/manual/performance-tips/#The-dangers-of-abusing-multiple-dispatch-(aka,-more-on-types-with-values-as-parameters)).
Consequently, an analysis of specialization can be a powerful tool for improving package quality.

SnoopCompile ships with an interactive tool, [`pgdsgui`](@ref), short for "Profile-guided despecialization."
`SnoopCompile` ships with an interactive tool, [`pgdsgui`](@ref), short for "Profile-guided despecialization."
The name is a reference to a related technique, [profile-guided optimization](https://en.wikipedia.org/wiki/Profile-guided_optimization) (PGO).
Both PGO and PGDS use rutime profiling to help guide decisions about code optimization.
PGO is often used in languages whose default mode is to avoid specialization, whereas PGDS seems more appropriate for
Expand Down Expand Up @@ -130,7 +130,7 @@ julia> collect_for(mref[], tinf)

So we can see that one `MethodInstance` for each type in `Ts` was generated.

If you see a list of MethodInstances, and the first is extremely costly in terms of inclusive time, but all the rest are not, then you might not need to worry much about over-specialization:
If you see a list of `MethodInstance`s, and the first is extremely costly in terms of inclusive time, but all the rest are not, then you might not need to worry much about over-specialization:
your inference time will be dominated by that one costly method (often, the first time the method was called), and the fact that lots of additional specializations were generated may not be anything to worry about.
However, in this case, the distribution of time is fairly flat, each contributing a small portion to the overall time.
In such cases, over-specialization may be a problem.
Expand Down Expand Up @@ -229,4 +229,4 @@ julia> methodinstances(m) # let's see what specializations we have
MethodInstance for save(::String, ::Array)
```

In this case we have 7 MethodInstances (some of which are clearly due to poor inferrability of the caller) when one might suffice.
In this case we have 7 `MethodInstance`s (some of which are clearly due to poor inferrability of the caller) when one might suffice.
16 changes: 8 additions & 8 deletions docs/src/snoopi_deep.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ For that reason, efforts at reducing latency should be informed by measuring the
Moreover, because all code needs to be type-inferred before undergoing later stages of code generation, monitoring this "entry point" can give you an overview of the entire compile chain.

On older versions of Julia, [`@snoopi`](@ref) allows you to make fairly coarse measurements on inference;
starting with Julia 1.6, the recommended tool is `@snoopi_deep`, which collects a much more detailed picture of type-inference's actions.
starting with Julia 1.6, the recommended tool is [`@snoopi_deep`](@ref), which collects a much more detailed picture of type-inference's actions.

The rich data collected by `@snoopi_deep` are useful for several different purposes;
on this page, we'll describe the basic tool and show how it can be used to profile inference.
On later pages we'll show other ways to use the data to reduce the amount of type-inference or cache its results.

## Collecting the data

Like [`@snoopr`](@ref), `@snoopi_deep` is exported by both SnoopCompileCore and SnoopCompile, but in this case there is not as much reason to do the data collection by a very minimal package. Consequently here we'll just load SnoopCompile at the outset.
Like [`@snoopr`](@ref), `@snoopi_deep` is exported by both `SnoopCompileCore` and `SnoopCompile`, but in this case there is not as much reason to do the data collection by a very minimal package. Consequently here we'll just load `SnoopCompile` at the outset.

To see `@snoopi_deep` in action, we'll use the following demo:

Expand Down Expand Up @@ -55,7 +55,7 @@ InferenceTimingNode: 0.00932195/0.010080857 on InferenceFrameInfo for Core.Compi
!!! tip
Inference gets called only on the *first* invocation of a method with those specific types. You have to redefine the `FlattenDemo` module (by just re-executing the command we used to define it) if you want to collect data with `@snoopi_deep` on the same code a second time.

To make it easier to perform these demonstrations and use them for documentation purposes, SnoopCompile includes a function [`SnoopCompile.flatten_demo()`](@ref) that redefines the module and returns `tinf`.
To make it easier to perform these demonstrations and use them for documentation purposes, `SnoopCompile` includes a function [`SnoopCompile.flatten_demo()`](@ref) that redefines the module and returns `tinf`.

This may not look like much, but there's a wealth of information hidden inside `tinf`.

Expand All @@ -77,7 +77,7 @@ A non-empty list might indicate method invalidations, which can be checked (in a
!!! tip
Your workload may load packages and/or (re)define methods; these can be sources of invalidation and therefore non-empty output
from `staleinstances`.
One trick that may cirumvent some invalidation is to load the packages and make the method definitions before launching `@snoopi_deep`, because it ensures the methods are in place
One trick that may circumvent some invalidation is to load the packages and make the method definitions before launching `@snoopi_deep`, because it ensures the methods are in place
before your workload triggers compilation.

## Viewing the results
Expand Down Expand Up @@ -123,7 +123,7 @@ MethodInstance for FlattenDemo.packintype(::Int64)
```

Each node in this tree is accompanied by a pair of numbers.
The first number is the *exclusive* inference time (in seconds), meaning the time spent inferring the particular MethodInstance, not including the time spent inferring its callees.
The first number is the *exclusive* inference time (in seconds), meaning the time spent inferring the particular `MethodInstance`, not including the time spent inferring its callees.
The second number is the *inclusive* time, which is the exclusive time plus the time spent on the callees.
Therefore, the inclusive time is always at least as large as the exclusive time.

Expand All @@ -133,7 +133,7 @@ Almost all of that was code-generation, but it also includes the time needed to
Just 0.76ms was needed to run type-inference on this entire series of calls.
As you will quickly discover, inference takes much more time on more complicated code.

We can also display this tree as a flame graph, using the [ProfileView](https://github.com/timholy/ProfileView.jl) package:
We can also display this tree as a flame graph, using the [ProfileView.jl](https://github.com/timholy/ProfileView.jl) package:

```jldoctest flatten-demo; filter=r":\d+"
julia> fg = flamegraph(tinf)
Expand All @@ -154,15 +154,15 @@ Users are encouraged to read the ProfileView documentation to understand how to

- the horizontal axis is time (wide boxes take longer than narrow ones), the vertical axis is call depth
- hovering over a box displays the method that was inferred
- left-clicking on a box causes the full MethodInstance to be printed in your REPL session
- left-clicking on a box causes the full `MethodInstance` to be printed in your REPL session
- right-clicking on a box opens the corresponding method in your editor
- ctrl-click can be used to zoom in
- empty horizontal spaces correspond to activities other than type-inference
- any boxes colored red (there are none in this particular example, but you'll see some later) correspond to *non-precompilable* `MethodInstance`s, in which the method is owned by one module but the types are from another unrelated module.

You can explore this flamegraph and compare it to the output from `display_tree`.

Finally, [`flatten`](@ref), on its own or together with [`accumulate_by_source`](@ref), allows you to get an sense for the cost of individual MethodInstances or Methods.
Finally, [`flatten`](@ref), on its own or together with [`accumulate_by_source`](@ref), allows you to get an sense for the cost of individual `MethodInstance`s or `Method`s.

The tools here allow you to get an overview of where inference is spending its time.
Sometimes, this information alone is enough to show you how to change your code to reduce latency: perhaps your code is spending a lot of time inferring cases that are not needed in practice and could be simplified.
Expand Down
26 changes: 13 additions & 13 deletions docs/src/snoopi_deep_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ As indicated in the [workflow](@ref), the recommended steps to reduce latency ar
- check for invalidations
- adjust method specialization in your package or its dependencies
- fix problems in type inference
- add precompile directives
- add `precompile` directives

The importance of fixing "problems" in type-inference was indicated in the [tutorial](@ref): successful precompilation requires a chain of ownership, but runtime dispatch (when inference cannot predict the callee) results in breaks in this chain. By improving inferrability, you can convert short, unconnected call-trees into a smaller number of large call-trees that all link back to your package(s).

In practice, it also turns out that opportunities to adjust specialization are often revealed by analyzing inference failures, so this page is complementary to the previous one.

Throughout this page, we'll use the `OptimizeMe` demo, which ships with SnoopCompile.
Throughout this page, we'll use the `OptimizeMe` demo, which ships with `SnoopCompile`.

!!! note
To understand what follows, it's essential to refer to [OptimizeMe source code](https://github.com/timholy/SnoopCompile.jl/blob/master/examples/OptimizeMe.jl) as you follow along.
To understand what follows, it's essential to refer to [`OptimizeMe` source code](https://github.com/timholy/SnoopCompile.jl/blob/master/examples/OptimizeMe.jl) as you follow along.

```julia
julia> using SnoopCompile
Expand Down Expand Up @@ -58,12 +58,12 @@ From the standpoint of precompilation, this has some obvious problems:
- even though we called a single method, `OptimizeMe.main()`, there are many distinct flames separated by blank spaces. This indicates that many calls are being made by runtime dispatch: each separate flame is a fresh entrance into inference.
- several of the flames are marked in red, indicating that they are not precompilable. While SnoopCompile does have the capability to automatically emit `precompile` directives for the non-red bars that sit on top of the red ones, in some cases the red extends to the highest part of the flame. In such cases there is no available precompile directive, and therefore no way to avoid the cost of type-inference.

Our goal will be to improve the design of OptimizeMe to make it more precompilable.
Our goal will be to improve the design of `OptimizeMe` to make it more precompilable.

## Analyzing inference triggers

We'll first extract the "triggers" of inference, which is just a repackaging of part of the information contained within `tinf`.
Specifically an [`InferenceTrigger`](@ref) captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what MethodInstance they called.
Specifically an [`InferenceTrigger`](@ref) captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what `MethodInstance` they called.

```julia
julia> itrigs = inference_triggers(tinf)
Expand All @@ -78,7 +78,7 @@ This indicates that a whopping 76 calls were (1) made by runtime dispatch and (2
(There was a 77th call that had to be inferred, the original call to `main()`, but by default [`inference_triggers`](@ref) excludes calls made directly from top-level. You can change that through keyword arguments.)

!!! tip
In the REPL, SnoopCompile displays `InferenceTrigger`s with yellow coloration for the callee, red for the caller method, and blue for the caller specialization. This makes it easier to quickly identify the most important information.
In the REPL, `SnoopCompile` displays `InferenceTrigger`s with yellow coloration for the callee, red for the caller method, and blue for the caller specialization. This makes it easier to quickly identify the most important information.

In some cases, this might indicate that you'll need to fix 76 separate callers; fortunately, in many cases fixing the origin of inference problems can fix a number of later callees.

Expand Down Expand Up @@ -155,7 +155,7 @@ Inference triggered to call MethodInstance for (::Base.var"#cat_t##kw")(::NamedT
```

This is useful if you want to analyze a method via [`ascend`](@ref ascend-itrig).
Method-based triggers, which may aggregate many different individual triggers, are particularly useful mostly because tools like Cthulhu show you the inference results for the entire MethodInstance, allowing you to fix many different inference problems at once.
`Method`-based triggers, which may aggregate many different individual triggers, are particularly useful mostly because tools like [Cthulhu.jl](https://github.com/JuliaDebug/Cthulhu.jl) show you the inference results for the entire `MethodInstance`, allowing you to fix many different inference problems at once.

### Trigger trees

Expand Down Expand Up @@ -193,7 +193,7 @@ We're going to march through these systematically. Let's start with the first of

### `suggest` and a fix involving manual `eltype` specification

Because the analysis of inference failures is somewhat complex, SnoopCompile attempts to `suggest` an interpretation and/or remedy for each trigger:
Because the analysis of inference failures is somewhat complex, `SnoopCompile` attempts to [`suggest`](@ref) an interpretation and/or remedy for each trigger:

```
julia> suggest(itree.children[1])
Expand Down Expand Up @@ -289,7 +289,7 @@ julia> suggest(itree.children[2])
lotsa_containers() at OptimizeMe.jl:14
```

While this tree is attributed to broadcast, you can see several references here to `OptimizeMe.jl:14`, which contains:
While this tree is attributed to `broadcast`, you can see several references here to `OptimizeMe.jl:14`, which contains:

```julia
cs = Container.(list)
Expand Down Expand Up @@ -331,7 +331,7 @@ cs = Container{Any}.(list)
```

This 5-character change ends up eliminating 45 of our original 76 triggers.
Not only did we eliminate the triggers from broadcasting, but we limited the number of different `show(::IO, ::Container{T})` MethodInstances we need from later calls in `main`.
Not only did we eliminate the triggers from broadcasting, but we limited the number of different `show(::IO, ::Container{T})`-`MethodInstance`s we need from later calls in `main`.

When the `Container` constructor does more complex operations, in some cases you may find that `Container{Any}(args...)` still gets specialized for different types of `args...`.
In such cases, you can create a special constructor that instructs Julia to avoid specialization in specific instances, e.g.,
Expand Down Expand Up @@ -597,7 +597,7 @@ Inference triggered to call MethodInstance for show(::IOContext{Base.TTY}, ::MIM
In this case we see that the method is `#38`. This is a `gensym`, or generated symbol, indicating that the method was generated during Julia's lowering pass, and might indicate a macro, a `do` block or other anonymous function, the generator for a `@generated` function, etc.

!!! warning
It's particularly worth your while to improve inferrability for gensym-methods. The number assiged to a gensymmed-method may change as you or other developers modify the package (possibly due to changes at very difference source-code locations), and so any explicit `precompile` directives involving gensyms may not have a long useful life.
It's particularly worthwhile to improve inferrability for gensym-methods. The number assiged to a gensymmed-method may change as you or other developers modify the package (possibly due to changes at very difference source-code locations), and so any explicit `precompile` directives involving gensyms may not have a long useful life.

But not all methods with `#` in their name are problematic: methods ending in `##kw` or that look like `##funcname#39` are *keyword* and *body* methods, respectively, for methods that accept keywords. They can be obtained from the main method, and so `precompile` directives for such methods will not be outdated by incidental changes to the package.

Expand All @@ -623,12 +623,12 @@ end
The generated method corresponds to the `do` block here.
The call to `show` comes from `show(io, mime, x[])`.
This implementation uses a clever trick, wrapping `x` in a `Ref{Any}(x)`, to prevent specialization of the method defined by the `do` block on the specific type of `x`.
This trick is designed to limit the number of MethodInstances inferred for this `display` method.
This trick is designed to limit the number of `MethodInstance`s inferred for this `display` method.

Unfortunately, from the standpoint of precompilation we have something of a conundrum.
It turns out that this trigger corresponds to the first of the big red flames in the flame graph.
`show(::IOContext{Base.TTY}, ::MIME{Symbol("text/plain")}, ::Vector{Main.OptimizeMe.Container{Any}})` is not precompilable because `Base` owns the `show` method for `Vector`;
we might own the element type, but we're leveraging the generic machinery in Base and consequently it owns the method.
we might own the element type, but we're leveraging the generic machinery in `Base` and consequently it owns the method.
If these were all packages, you might request its developers to add a `precompile` directive, but that will work only if the package that owns the method knows about the relevant type.
In this situation, Julia's `Base` module doesn't know about `OptimizeMe.Container{Any}`, so we're stuck.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/snoopi_deep_parcel.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Using `@snoopi_deep` results to generate precompile directives
# Using `@snoopi_deep` results to generate `precompile` directives

Improving inferrability, specialization, and precompilability may sometimes feel like "eating your vegetables": really good for you, but it sometimes feels like work. (Depending on tastes, of course; I love vegetables.)
While we've already gotten some payoff, now we're going to collect an additional reward for our hard work: the "dessert" of adding `precompile` directives.
Expand Down
Loading