Dispatch Analysis
Motivation
When Julia compiles your code and type inference on it was not so successful, the compiler is likely to be unable to determine which method should be called at each generic function callsite, and then it will be looked up at runtime. That is called "runtime dispatch", which is known as a common source of performance problem – the compiler can't do various kinds of optimizations including inlining when it can't determine a single matching method, and method lookup itself can also be a bottleneck if the call happens many times.
In order to avoid this problem, we usually use code_typed
, inspect its output, and check if there is anywhere type is not well inferred (i.e. where is "type-instable") and optimization was not successful. But the problem is that code_typed
can only present the "final" output of inference or optimization, and we can't inspect an entire call graph and may not be able to find where a problem happened and how the "type instability" has been propagated.
There is a nice package called Cthulhu.jl, which allows us to inspect the output of code_typed
by descending into a call tree, recursively and interactively. The workflow with Cthulhu is much more powerful, but still, it's tedious.
So, why not automate it ? We can use JET's pluggable analysis framework and create such an analyzer that automatically analyzes your code and alarms you when it detects anywhere Julia can't determine matching method statically and thus runtime dispatch will happen at runtime.
Implementation
In this analysis, the analyzer will be designed to detect:
- where Julia compiler gives up optimization
- where a runtime dispatch will happen
The case 1. will happen when there are (mutually) recursive calls and Julia compiler decided not to do inference in order to make sure the inference's termination. In such a case, optimization won't happen and method dispatches aren't resolved statically, so we will just report it (as OptimizationFailureReport
). In order to detect the case 2., we will inspect the optimized IR and look for :call
expressions. :call
expressions are such calls that were not resolved statically and will be dispatched at runtime (as opposed to :invoke
expressions, that represent staticall resolved generic function calls).
We will define DispatchAnalyzer <: AbstractAnalyzer
, and overload some of Core.Compiler
methods with it:
Core.Compiler.finish(frame::CC.InferenceState, analyzer::DispatchAnalyzer)
to check if optimization will happen or not (the case 1.)Core.Compiler.finish!(analyzer::DispatchAnalyzer, caller::CC.InferenceResult)
to inspect an optimized IR (the case 2.)
using JET.JETInterface
const CC = Core.Compiler
using JET: JET
struct DispatchAnalyzer{T} <: AbstractAnalyzer
state::AnalyzerState
analysis_cache::AnalysisCache
opts::BitVector
frame_filter::T # a predicate, which takes `CC.InfernceState` and returns whether we want to analyze the call or not
end
# AbstractAnalyzer API requirements
JETInterface.AnalyzerState(analyzer::DispatchAnalyzer) = analyzer.state
JETInterface.AbstractAnalyzer(analyzer::DispatchAnalyzer, state::AnalyzerState) = DispatchAnalyzer(state, analyzer.analysis_cache, analyzer.opts, analyzer.frame_filter)
JETInterface.ReportPass(analyzer::DispatchAnalyzer) = DispatchAnalysisPass()
JETInterface.AnalysisCache(analyzer::DispatchAnalyzer) = analyzer.analysis_cache
struct DispatchAnalysisPass <: ReportPass end
# ignore all reports defined by JET, since we'll just define our own reports
(::DispatchAnalysisPass)(T::Type{<:InferenceErrorReport}, @nospecialize(_...)) = return
function CC.finish!(analyzer::DispatchAnalyzer, frame::CC.InferenceState)
caller = frame.result
# get the source before running `finish!` to keep the reference to `OptimizationState`
src = caller.src
if src isa CC.OptimizationState{typeof(analyzer)}
# allow the following analysis passes to see the optimized `CodeInfo`
caller.src = CC.ir_to_codeinf!(src)
end
if analyzer.frame_filter(frame.linfo)
if isa(src, Core.Const) # the optimization was very successful, nothing to report
elseif isnothing(src) # means, compiler decides not to do optimization
ReportPass(analyzer)(OptimizationFailureReport, analyzer, caller, src)
elseif isa(src, CC.OptimizationState) # the compiler optimized it, analyze it
ReportPass(analyzer)(RuntimeDispatchReport, analyzer, caller, src)
else # and thus this pass should never happen
# as we should already report `OptimizationFailureReport` for this case
throw("got $src, unexpected source found")
end
end
return @invoke CC.finish!(analyzer::AbstractAnalyzer, frame::CC.InferenceState)
end
@jetreport struct OptimizationFailureReport <: InferenceErrorReport end
function JETInterface.print_report_message(io::IO, ::OptimizationFailureReport)
print(io, "failed to optimize due to recursion")
end
function (::DispatchAnalysisPass)(::Type{OptimizationFailureReport}, analyzer::DispatchAnalyzer, result::CC.InferenceResult)
add_new_report!(analyzer, result, OptimizationFailureReport(result.linfo))
end
@jetreport struct RuntimeDispatchReport <: InferenceErrorReport end
function JETInterface.print_report_message(io::IO, ::RuntimeDispatchReport)
print(io, "runtime dispatch detected")
end
function (::DispatchAnalysisPass)(::Type{RuntimeDispatchReport}, analyzer::DispatchAnalyzer, caller::CC.InferenceResult, opt::CC.OptimizationState)
(; sptypes, slottypes) = opt
for (pc, x) in enumerate(opt.src.code)
if Base.Meta.isexpr(x, :call)
ft = CC.widenconst(CC.argextype(first(x.args), opt.src, sptypes, slottypes))
ft <: Core.Builtin && continue # ignore `:call`s of the builtin intrinsics
add_new_report!(analyzer, caller, RuntimeDispatchReport((opt, pc)))
end
end
end
Usages
So we defined our analyzer. Let's set up utility analysis entries first:
using InteractiveUtils # to use `gen_call_with_extracted_types_and_kwargs`
# the constructor for creating a new configured `DispatchAnalyzer` instance
function DispatchAnalyzer(world::UInt = Base.get_world_counter();
frame_filter = x::Core.MethodInstance->true,
jetconfigs...)
state = AnalyzerState(world; jetconfigs...)
# just for the sake of simplicity, create a fresh code cache for each `DispatchAnalyzer` instance (i.e. don't globalize the cache)
analysis_cache = AnalysisCache()
return DispatchAnalyzer(state, analysis_cache, BitVector(), frame_filter)
end
function report_dispatch(args...; jetconfigs...)
@nospecialize args jetconfigs
analyzer = DispatchAnalyzer(; jetconfigs...)
return analyze_and_report_call!(analyzer, args...; jetconfigs...)
end
macro report_dispatch(ex0...)
return InteractiveUtils.gen_call_with_extracted_types_and_kwargs(__module__, :report_dispatch, ex0)
end
@report_dispatch (macro with 1 method)
Now we can just call @report_dispatch f(args...)
and check if there are any problematic part within the entire call tree of f(args...)
.
Simple cases
First, let's play with simple and factitious examples and check if DispatchAnalyzer
works as expected.
getsomething(x::Any) = x
getsomething(x::Array) = x[]
getsomething(::Nothing) = throw(ArgumentError("nothing is nothing"))
getsomething(::Missing) = throw(ArgumentError("too philosophical"))
getsomething (generic function with 4 methods)
If callsite is type-stable (i.e. dispatched with concretely-typed arguments), any problem shouldn't be reported:
@report_dispatch getsomething(42) # should be ok
No errors detected
But if the argument isn't well-typed, compiler can't determine which method to call, and it will lead to runtime dispatch:
report_dispatch((Any,)) do a
getsomething(a) # runtime dispatch !
end
═════ 1 possible error found ═════
┌ (::Main.var"#7#8")(a::Any) @ Main ./dispatch_analysis.md:184
│ runtime dispatch detected: Main.getsomething(a::Any)::Any
└────────────────────
Note that even if a call is not "well-typed" (i.e. it's not a concrete call), runtime dispatch won't happen as far as a single method can be resolved statically:
report_dispatch((AbstractString,)) do a
getsomething(a) # this call isn't very concrete, but ok, Julia can optimize it
end
No errors detected
Ok, working nicely so far. Let's move on to a bit more complicated examples. When working on inherently-untyped code base where we somehow need to deal with arbitrarily-typed objects at runtime (as like Julia's high-level compiler), the @nospecialize
annotation can be very useful – it helps us avoids excessive code specialization by suppressing runtime dispatches with runtime object types. For example, let's assume we have a vector of arbitrary untyped objects used within an user-program and need to check if its element is Type
-like object with the following logic:
function isTypelike(x)
if isa(x, DataType)
return isa(x, DataType) && x.name === Type.body.name
elseif isa(x, Union)
return isTypelike(x.a) && isTypelike(x.b)
elseif isa(x, UnionAll)
return isTypelike(x.body)
else
return false
end
end
isTypelike (generic function with 1 method)
But without @nospecialize
, we gonna see runtime dispatches at the recursive call sites as they will be specialized at runtime. In this setup, we can suppress the runtime dipsatches and achieve a best performance by applying @nospecialize
annotation to the argument x
:
function isTypelike′(@nospecialize x)
if isa(x, DataType)
return isa(x, DataType) && x.name === Type.body.name
elseif isa(x, Union)
return isTypelike′(x.a) && isTypelike′(x.b)
elseif isa(x, UnionAll)
return isTypelike′(x.body)
else
return false
end
end
isTypelike′ (generic function with 1 method)
We can confirm the effect of @nospecialize
with DispatchAnalyzer
:
report_dispatch((Vector{Any},)) do xs
x = xs[1]
r = isTypelike(x) # this call will be runtime-dispatched
r′ = isTypelike′(x) # this call will be statically resolved (not runtime-dispatched)
return r, r′
end
═════ 4 possible errors found ═════
┌ (::Main.var"#11#12")(xs::Vector{Any}) @ Main ./dispatch_analysis.md:244
│┌ isTypelike(x::Any) @ Main ./dispatch_analysis.md:212
││ runtime dispatch detected: Main.isTypelike(%11::Any)::Bool
│└────────────────────
│┌ isTypelike(x::Any) @ Main ./dispatch_analysis.md:212
││ runtime dispatch detected: Main.isTypelike(%15::Any)::Bool
│└────────────────────
│┌ isTypelike(x::Any) @ Main ./dispatch_analysis.md:214
││ runtime dispatch detected: Main.isTypelike(%22::Any)::Bool
│└────────────────────
┌ (::Main.var"#11#12")(xs::Vector{Any}) @ Main ./dispatch_analysis.md:244
│ runtime dispatch detected: Main.isTypelike(%17::Any)::Bool
└────────────────────
We can assert this report by looking at the output of code_typed
, where isTypelike(x)
remains as :call
expression (meaning it will be dispatched at runtime) while isTypelike′(x)
has been statically resolved and even inlined:
code_typed((Vector{Any},)) do xs
x = xs[1]
r = isTypelike(x) # this call will be runtime-dispatched
r′ = isTypelike′(x) # this call will be statically resolved (not runtime-dispatched)
return r, r′
end
1-element Vector{Any}:
CodeInfo(
1 ── %1 = $(Expr(:boundscheck, true))::Bool
└─── goto #5 if not %1
2 ── %3 = Base.sub_int(1, 1)::Int64
│ %4 = Base.bitcast(Base.UInt, %3)::UInt64
│ %5 = Base.getfield(xs, :size)::Tuple{Int64}
│ %6 = $(Expr(:boundscheck, true))::Bool
│ %7 = Base.getfield(%5, 1, %6)::Int64
│ %8 = Base.bitcast(Base.UInt, %7)::UInt64
│ %9 = Base.ult_int(%4, %8)::Bool
└─── goto #4 if not %9
3 ── goto #5
4 ── %12 = Core.tuple(1)::Tuple{Int64}
│ invoke Base.throw_boundserror(xs::Vector{Any}, %12::Tuple{Int64})::Union{}
└─── unreachable
5 ┄─ %15 = Base.getfield(xs, :ref)::MemoryRef{Any}
│ %16 = Base.memoryrefnew(%15, 1, false)::MemoryRef{Any}
│ %17 = Base.memoryrefget(%16, :not_atomic, false)::Any
└─── goto #6
6 ── %19 = Main.isTypelike(%17)::Bool
│ %20 = (%17 isa Main.DataType)::Bool
└─── goto #8 if not %20
7 ── %22 = π (%17, DataType)
│ %23 = Base.getfield(%22, :name)::Core.TypeName
│ %24 = (%23 === $(QuoteNode(typename(Type))))::Bool
└─── goto #15
8 ── %26 = (%17 isa Main.Union)::Bool
└─── goto #12 if not %26
9 ── %28 = π (%17, Union)
│ %29 = Base.getfield(%28, :a)::Any
│ %30 = invoke Main.isTypelike′(%29::Any)::Bool
└─── goto #11 if not %30
10 ─ %32 = π (%17, Union)
│ %33 = Base.getfield(%32, :b)::Any
│ %34 = invoke Main.isTypelike′(%33::Any)::Bool
└─── goto #15
11 ─ goto #15
12 ─ %37 = (%17 isa Main.UnionAll)::Bool
└─── goto #14 if not %37
13 ─ %39 = π (%17, UnionAll)
│ %40 = Base.getfield(%39, :body)::Any
│ %41 = invoke Main.isTypelike′(%40::Any)::Bool
└─── goto #15
14 ─ goto #15
15 ┄ %44 = φ (#7 => %24, #10 => %34, #11 => false, #13 => %41, #14 => false)::Bool
│ %45 = Core.tuple(%19, %44)::Tuple{Bool, Bool}
└─── return %45
) => Tuple{Bool, Bool}
Real-world targets
Let's run DispatchAnalyzer
on real-world code and check how it works. Here we will test with Julia's Base
module.
Random number generation might be one of the most important feature for numerical computations, and so Julia's rand
function should run fast. Let's see if it is free from runtime-dispatch.
@report_dispatch rand(1:1000)
No errors detected
Oh no, runtime dispatch happens there even in Base
. Well, actually, this specific dispatch is expected. Especially, https://github.com/JuliaLang/julia/pull/35982 implements an heuristic to intentionally disable inference (and so succeeding optimizations too) in order to ease the latency problem, a.k.a. "first-time-to-plot". The report trace certainly suggests a dispatch was detected where ArgumentError
can be thrown. We can turn off the heuristic by turning off the unoptimize_throw_blocks::Bool
configuration, and this time any runtime dispatch won't be reported:
@report_dispatch unoptimize_throw_blocks=false rand(1:1000) # nothing should be reported
No errors detected
Finally, let's see an example of very "type-instable" code maintained within Base
. Typically, anything involving I/O is written in a very dynamic way for good reasons, and certainly, e.g. println(QuoteNode(nothing))
will yield bunch of runtime dispatches:
@report_dispatch println(QuoteNode(nothing))
═════ 585 possible errors found ═════
┌ @ coreio.jl:4 Base.println(Core.tuple(Core.typeassert(Base.stdout, Base.IO)), xs...)
│┌ @ strings/io.jl:73 Base.print(Core.tuple(io), xs, Core.tuple("\n")...)
││┌ @ strings/io.jl:43 Base.lock(io)
│││┌ @ show.jl:334 Base.lock(Base.getproperty(io, :io))
# so many reports follow ...
But what if your code contains a single println
call, which you're absolutely okay with the type instabilities involved with it (e.g. it's only called once or only in debug mode, or such), but still you want to assert that any other part of code is type-stable and dispatch-free ?
# problem: when ∑1/n exceeds 30 ?
function compute(x)
r = 1
s = 0.0
n = 1
@time while r < x
s += 1/n
if s ≥ r
println("round $r/$x has been finished") # we're not interested type-instabilities within this call
r += 1
end
n += 1
end
return n, s
end
compute (generic function with 1 method)
DispatchAnalyzer
's frame_filter
option can be useful for this, by allowing us to specify where it should and shouldn't run analysis. For example, we can limit the scope of analysis to the current module like this:
function module_filter(m) # filter by module
return function (linfo::Core.MethodInstance)
def = linfo.def
isa(def, Method) ? def.module === m : def === m
end
end
# NOTE:
# `compute(30)` will take more than hours in actual execution, according to https://twitter.com/genkuroki/status/1401332946707963909,
# but `@report_dispatch` will just do abstract interpretation of the call, so will finish instantly
@report_dispatch frame_filter=module_filter(@__MODULE__) compute(30)
═════ 1 possible error found ═════
┌ compute(x::Int64) @ Main ./timing.jl:322
│ runtime dispatch detected: Core.kwcall([quote]::@NamedTuple{msg::Nothing}, Base.time_print, %140::IO, %134::Float64, %112::Int64, %127::Int64, %137::Int64, %89::Int64, %138::Float64, %139::Float64, true)::Any
└────────────────────
This page was generated using Literate.jl.