Dispatch Analysis
When Julia compiles your code but type inference was not so successful, the compiler is likely to be unable to resolve which method should be called at each generic function call-site, and then it will be looked up at runtime. That is called "runtime dispatch", which is known as a common source of performance problem — since the compiler can't do various kinds of optimizations including inlining when it doesn't know matching methods, and method lookup itself can also be a bottleneck if it happens many times.
In order to avoid this problem, we usually use code_typed
or its family, inspect their output, and check if there is anywhere type is not well inferred (i.e. where is "type-instable") and optimization was not successful. But the problem is that they can only present the "final" output of inference or optimization, and we can't inspect an entire call graph and may not be able to find where a problem happened and how the "type instability" has been propagated.
There is a nice package called Cthulhu.jl, which allows us to inspect the output of code_typed
by descending into a call tree, recursively and interactively. The workflow with Cthulhu is much more powerful, but still, it's tedious.
So, why not automate it ? JETTest.jl implements such an analyzer that investigates optimized IRs of your code and automatically detects anywhere the compiler failed to do optimizations, or couldn't resolve matching methods and thus dispatch will happen at runtime.
Quick Start
@report_dispatch
analyzes the entire call graph of a given generic function call, and then reports detected optimization failures and runtime dispatch points:
julia> using JETTest
julia> n = rand(Int);
julia> make_vals(n) = n ≥ 0 ? (zero(n):n) : (n:zero(n));
julia> function sumup(f)
vals = make_vals(n) # this function uses the non-constant global variable here and it makes everything very type-unstable
s = zero(eltype(vals))
for v in vals
s += f(v)
end
return s
end;
julia> @report_dispatch sumup(sin) # runtime dispatches will be reported
═════ 7 possible errors found ═════
┌ @ none:2 Main.__atexample__named__quickstart.make_vals(%1)
│ runtime dispatch detected: Main.__atexample__named__quickstart.make_vals(%1::Any)
└──────────
┌ @ none:3 Main.__atexample__named__quickstart.eltype(%2)
│ runtime dispatch detected: Main.__atexample__named__quickstart.eltype(%2::Any)
└──────────
┌ @ none:3 Main.__atexample__named__quickstart.zero(%3)
│ runtime dispatch detected: Main.__atexample__named__quickstart.zero(%3::Any)
└──────────
┌ @ none:4 Base.iterate(%2)
│ runtime dispatch detected: Base.iterate(%2::Any)
└──────────
┌ @ none:5 f(%11)
│ runtime dispatch detected: f::typeof(sin)(%11::Any)
└──────────
┌ @ none:5 Main.__atexample__named__quickstart.+(%10, %13)
│ runtime dispatch detected: Main.__atexample__named__quickstart.+(%10::Any, %13::Any)
└──────────
┌ @ none:5 Base.iterate(%2, %12)
│ runtime dispatch detected: Base.iterate(%2::Any, %12::Any)
└──────────
Any
julia> function sumup(f, n) # we can pass parameters as a function argument, and then everything is type-stable
vals = make_vals(n)
s = zero(eltype(vals))
for v in vals
s += f(v) # we may get an union type, but Julia can optimize away small unions (thus no dispatch here)
end
return s
end;
julia> @report_dispatch sumup(sin, rand(Int)) # now runtime dispatch free !
No errors !
Union{Float64, Int64}
With the frame_filter
configuration, we can focus on type instabilities within specific modules of our interest:
julia> # problem: when ∑1/n exceeds `x` ?
function compute(x)
r = 1
s = 0.0
n = 1
@time while r < x
s += 1/n
if s ≥ r
# `println` call is full of runtime dispatches for good reasons
# and we're not interested in type-instabilities within this call
# since we know it's only called few times
println("round $r/$x has been finished")
r += 1
end
n += 1
end
return n, s
end
compute (generic function with 1 method)
julia> @report_dispatch compute(30) # bunch of reports will be reported from the `println` call
═════ 21 possible errors found ═════
┌ @ none:12 Main.__atexample__named__quickstart.println(Base.string("round ", r, "/", x, " has been finished"))
│┌ @ coreio.jl:4 Base.println(Core.tuple(Core.typeassert(Base.stdout, Base.IO)), xs...)
││┌ @ strings/io.jl:73 Base.print(Core.tuple(io), xs, Core.tuple("\n")...)
│││┌ @ strings/io.jl:43 Base.lock(io)
││││┌ @ show.jl:334 Base.lock(Base.getproperty(io, :io))
│││││┌ @ stream.jl:282 Base.lock(Base.getproperty(s, :lock))
││││││┌ @ lock.jl:100 Base.wait(Base.getproperty(rl, :cond_wait))
│││││││┌ @ condition.jl:112 Base.wait()
││││││││┌ @ task.jl:823 Base.try_yieldto(Base.ensure_rescheduled)
│││││││││┌ @ task.jl:761 Base.getproperty(%7, :result)
││││││││││ runtime dispatch detected: Base.getproperty(%7::Task, :result::Symbol)
│││││││││└───────────────
│││││││││┌ @ task.jl:762 Base.setproperty!(%7, :result, Base.nothing)
││││││││││ runtime dispatch detected: Base.setproperty!(%7::Task, :result::Symbol, Base.nothing)
│││││││││└───────────────
│││││││││┌ @ task.jl:763 Base.setproperty!(%7, :_isexception, false)
││││││││││ runtime dispatch detected: Base.setproperty!(%7::Task, :_isexception::Symbol, false)
│││││││││└───────────────
│││││││┌ @ condition.jl:114 Base.list_deletefirst!(%63, %57)
││││││││ runtime dispatch detected: Base.list_deletefirst!(%63::Any, %57::Task)
│││││││└────────────────────
│││┌ @ strings/io.jl:49 Base.unlock(io)
││││┌ @ show.jl:335 Base.unlock(Base.getproperty(io, :io))
│││││┌ @ stream.jl:283 Base.unlock(Base.getproperty(s, :lock))
││││││┌ @ lock.jl:132 Base.notify(Base.getproperty(rl, :cond_wait))
│││││││┌ @ condition.jl:130 #self#(c, Base.nothing)
││││││││┌ @ condition.jl:130 Base.#notify#550(true, false, #self#, c, arg)
│││││││││┌ @ condition.jl:130 Base.notify(c, arg, all, error)
││││││││││┌ @ condition.jl:136 Core.kwfunc(Base.schedule)(Core.apply_type(Core.NamedTuple, (:error,))(Core.tuple(error)), Base.schedule, t, arg)
│││││││││││┌ @ task.jl:684 Base.#schedule#571(error, _3, t, arg)
││││││││││││┌ @ task.jl:686 %10(%11, t)
│││││││││││││ runtime dispatch detected: %10::typeof(Base.list_deletefirst!)(%11::Any, t::Task)
││││││││││││└───────────────
│┌ @ coreio.jl:4 Base.println(%3, %4)
││ runtime dispatch detected: Base.println(%3::IO, %4::String)
│└───────────────
┌ @ timing.jl:214 Base.time_print(elapsedtime, Base.getproperty(diff, :allocd), Base.getproperty(diff, :total_time), Base.gc_alloc_count(diff), compile_elapsedtime, true)
│┌ @ timing.jl:120 Base.sprint(#852)
││┌ @ strings/io.jl:106 Base.#sprint#418(Core.tuple(Base.nothing, 0, #self#, f), args...)
│││┌ @ strings/io.jl:112 f(Core.tuple(s), args...)
││││┌ @ timing.jl:123 Base.!=(%32, 0)
│││││ runtime dispatch detected: Base.!=(%32::Any, 0)
││││└─────────────────
││││┌ @ timing.jl:125 Base.!=(%65, 0)
│││││ runtime dispatch detected: Base.!=(%65::Any, 0)
││││└─────────────────
││││┌ @ timing.jl:126 Base.prettyprint_getunits(%73, %75, 1000)
│││││ runtime dispatch detected: Base.prettyprint_getunits(%73::Any, %75::Int64, 1000)
││││└─────────────────
││││┌ @ timing.jl:127 Base.==(%80, 1)
│││││ runtime dispatch detected: Base.==(%80::Any, 1)
││││└─────────────────
││││┌ @ timing.jl:128 Base.Int(%88)
│││││ runtime dispatch detected: Base.Int(%88::Any)
││││└─────────────────
││││┌ @ timing.jl:128 Base.getindex(Base._cnt_units, %80)
│││││ runtime dispatch detected: Base.getindex(Base._cnt_units, %80::Any)
││││└─────────────────
││││┌ @ timing.jl:128 Base.==(%96, 1)
│││││ runtime dispatch detected: Base.==(%96::Any, 1)
││││└─────────────────
││││┌ @ timing.jl:128 Base.print(io, %89, %90, %101)
│││││ runtime dispatch detected: Base.print(io::IOBuffer, %89::Any, %90::Any, %101::String)
││││└─────────────────
││││┌ @ timing.jl:130 Base.Float64(%110)
│││││ runtime dispatch detected: Base.Float64(%110::Any)
││││└─────────────────
││││┌ @ timing.jl:130 %104(%111, 2)
│││││ runtime dispatch detected: %104::typeof(Base.Ryu.writefixed)(%111::Any, 2)
││││└─────────────────
││││┌ @ timing.jl:130 Base.getindex(Base._cnt_units, %80)
│││││ runtime dispatch detected: Base.getindex(Base._cnt_units, %80::Any)
││││└─────────────────
││││┌ @ timing.jl:130 Base.print(io, %112, %113, " allocations: ")
│││││ runtime dispatch detected: Base.print(io::IOBuffer, %112::String, %113::Any, " allocations: ")
││││└─────────────────
││││┌ @ timing.jl:135 Base.!=(%138, 0)
│││││ runtime dispatch detected: Base.!=(%138::Any, 0)
││││└─────────────────
││││┌ @ timing.jl:141 Base.!=(%172, 0)
│││││ runtime dispatch detected: Base.!=(%172::Any, 0)
││││└─────────────────
│┌ @ timing.jl:148 Base.print(str)
││┌ @ coreio.jl:3 Base.print(%3, %4)
│││ runtime dispatch detected: Base.print(%3::IO, %4::String)
││└───────────────
Tuple{Int64, Float64}
julia> this_module_filter(sv) = sv.mod === @__MODULE__;
julia> @report_dispatch frame_filter=this_module_filter compute(30) # focus on what we wrote, and no error should be reported
No errors !
Tuple{Int64, Float64}
@test_nodispatch
can be used to assert that a given function call is free from type instabilities under Test
standard library's unit-testing infrastructure:
julia> @test_nodispatch sumup(cos)
Dispatch Test Failed at none:1
Expression: #= none:1 =# JETTest.@test_nodispatch sumup(cos)
═════ 7 possible errors found ═════
┌ @ none:2 Main.__atexample__named__quickstart.make_vals(%1)
│ runtime dispatch detected: Main.__atexample__named__quickstart.make_vals(%1::Any)
└──────────
┌ @ none:3 Main.__atexample__named__quickstart.eltype(%2)
│ runtime dispatch detected: Main.__atexample__named__quickstart.eltype(%2::Any)
└──────────
┌ @ none:3 Main.__atexample__named__quickstart.zero(%3)
│ runtime dispatch detected: Main.__atexample__named__quickstart.zero(%3::Any)
└──────────
┌ @ none:4 Base.iterate(%2)
│ runtime dispatch detected: Base.iterate(%2::Any)
└──────────
┌ @ none:5 f(%11)
│ runtime dispatch detected: f::typeof(cos)(%11::Any)
└──────────
┌ @ none:5 Main.__atexample__named__quickstart.+(%10, %13)
│ runtime dispatch detected: Main.__atexample__named__quickstart.+(%10::Any, %13::Any)
└──────────
┌ @ none:5 Base.iterate(%2, %12)
│ runtime dispatch detected: Base.iterate(%2::Any, %12::Any)
└──────────
ERROR: There was an error during testing
julia> @test_nodispatch frame_filter=this_module_filter compute(30)
Test Passed
Expression: #= none:2 =# JETTest.@test_nodispatch frame_filter = this_module_filter compute(30)
julia> using Test
julia> @testset "check type-stabilities" begin
@test_nodispatch sumup(cos) # should fail
n = rand(Int)
@test_nodispatch sumup(cos, n) # should pass
@test_nodispatch frame_filter=this_module_filter compute(30) # should pass
@test_nodispatch broken=true compute(30) # should pass with the "broken" annotation
end
check type-stabilities: Dispatch Test Failed at none:3
Expression: #= none:3 =# JETTest.@test_nodispatch sumup(cos)
═════ 7 possible errors found ═════
┌ @ none:2 Main.__atexample__named__quickstart.make_vals(%1)
│ runtime dispatch detected: Main.__atexample__named__quickstart.make_vals(%1::Any)
└──────────
┌ @ none:3 Main.__atexample__named__quickstart.eltype(%2)
│ runtime dispatch detected: Main.__atexample__named__quickstart.eltype(%2::Any)
└──────────
┌ @ none:3 Main.__atexample__named__quickstart.zero(%3)
│ runtime dispatch detected: Main.__atexample__named__quickstart.zero(%3::Any)
└──────────
┌ @ none:4 Base.iterate(%2)
│ runtime dispatch detected: Base.iterate(%2::Any)
└──────────
┌ @ none:5 f(%11)
│ runtime dispatch detected: f::typeof(cos)(%11::Any)
└──────────
┌ @ none:5 Main.__atexample__named__quickstart.+(%10, %13)
│ runtime dispatch detected: Main.__atexample__named__quickstart.+(%10::Any, %13::Any)
└──────────
┌ @ none:5 Base.iterate(%2, %12)
│ runtime dispatch detected: Base.iterate(%2::Any, %12::Any)
└──────────
Test Summary: | Pass Fail Broken Total
check type-stabilities | 2 1 1 4
ERROR: Some tests did not pass: 2 passed, 1 failed, 0 errored, 1 broken.
Entry Points
These macros/functions are the entries of dispatch analysis:
JETTest.@report_dispatch
— Macro@report_dispatch [jetconfigs...] f(args...)
Evaluates the arguments to the function call, determines its types, and then calls report_dispatch
on the resulting expression. As with @code_typed
and its family, any of JET configurations or dispatch analysis specific configurations can be given as the optional arguments like this:
# reports `rand(::Type{Bool})` with `unoptimize_throw_blocks` configuration turned on
julia> @report_call unoptimize_throw_blocks=true rand(Bool)
JETTest.report_dispatch
— Functionreport_dispatch(f, types = Tuple{}; jetconfigs...) -> result_type::Any
report_dispatch(tt::Type{<:Tuple}; jetconfigs...) -> result_type::Any
Analyzes the generic function call with the given type signature, and then prints detected optimization failures and runtime dispatch points to stdout
, and finally returns the result type of the call.
JETTest.@analyze_dispatch
— Macro@analyze_dispatch [jetconfigs...] f(args...)
Evaluates the arguments to the function call, determines its types, and then calls analyze_dispatch
on the resulting expression. As with @code_typed
and its family, any of JET configurations or dispatch analysis specific configurations can be given as the optional arguments like this:
# reports `rand(::Type{Bool})` with `unoptimize_throw_blocks` configuration turned on
julia> @analyze_dispatch unoptimize_throw_blocks=true rand(Bool)
JETTest.analyze_dispatch
— Functionanalyze_dispatch(f, types = Tuple{}; jetconfigs...) -> (analyzer::DispatchAnalyzer, frame::Union{InferenceFrame,Nothing})
analyze_dispatch(tt::Type{<:Tuple}; jetconfigs...) -> (analyzer::DispatchAnalyzer, frame::Union{InferenceFrame,Nothing})
Analyzes the generic function call with the given type signature, and returns:
analyzer::DispatchAnalyzer
: contains analyzed optimization failures and runtime dispatch pointsframe::Union{InferenceFrame,Nothing}
: the final state of the abstract interpretation, ornothing
iff
is a generator and the code generation has been failed
JETTest.@test_nodispatch
— Macro@test_nodispatch [jetconfigs...] [broken=false] [skip=false] f(args...)
Tests the generic function call f(args...)
is free from runtime dispatch. Returns a Pass
result if it is, a Fail
result if if contains any location where runtime dispatch or optimization failure happens, or an Error
result if this macro encounters an unexpected error. When the test Fail
s, abstract call stack to each problem location will also be printed to stdout
.
julia> @test_nodispatch sincos(10)
Test Passed
Expression: #= none:1 =# JETTest.@test_nodispatch sincos(10)
As with @report_dispatch
or @analyze_dispatch
, any of JET configurations or dispatch analysis specific configurations can be given as the optional arguments like this:
julia> function f(n)
r = sincos(n)
println(r) # `println` is full of runtime dispatches, but we can ignore the corresponding reports from `Base` by explicit frame filter
return r
end;
julia> this_module_filter(x) = x.mod === @__MODULE__;
julia> @test_nodispatch frame_filter=this_module_filter f(10)
Test Passed
Expression: #= none:1 =# JETTest.@test_nodispatch frame_filter = this_module_filter f(10)
@test_nodispatch
is fully integrated with Test
standard library's unit-testing infrastructure. It means, the result of @test_nodispatch
will be included in the final @testset
summary, it supports skip
and broken
annotations as @test
macro does, etc.
julia> using JETTest, Test
julia> f(params) = sin(params.value); # type-stable
julia> params = (; value = 10); # non-constant global variable
julia> g() = sin(params.value); # very type-instable
julia> @testset "check optimizations" begin
@test_nodispatch f((; value = 10)) # pass
@test_nodispatch g() # fail
@test_nodispatch broken=true g() # annotated as broken, thus still "pass"
end
check optimizations: Dispatch Test Failed at none:3
Expression: #= none:3 =# JETTest.@test_nodispatch g()
═════ 2 possible errors found ═════
┌ @ none:1 Base.getproperty(%1, :value)
│ runtime dispatch detected: Base.getproperty(%1::Any, :value::Symbol)
└──────────
┌ @ none:1 Main.sin(%2)
│ runtime dispatch detected: Main.sin(%2::Any)
└──────────
Test Summary: | Pass Fail Broken Total
check optimizations | 1 1 1 3
ERROR: Some tests did not pass: 1 passed, 1 failed, 0 errored, 1 broken.
JETTest.test_nodispatch
— Functiontest_nodispatch(f, types = Tuple{}; broken::Bool = false, skip::Bool = false, jetconfigs...)
test_nodispatch(tt::Type{<:Tuple}; broken::Bool = false, skip::Bool = false, jetconfigs...)
Tests the generic function call with the given type signature is free from runtime dispatch. Except that it takes a type signature rather than a call expression, this function works in the same way as @test_nodispatch
.
Configurations
JETTest.DispatchAnalyzer
— TypeEvery entry point of dispatch analysis can accept any of JET configurations as well as the following additional configurations that are specific to dispatch analysis.
frame_filter = x::Union{Core.Compiler.InferenceState, Core.Compiler.OptimizationState}->true
:
A predicate which takesInfernceState
orOptimizationState
and returnsfalse
to skip analysis on the frame.# only checks code within the current module: julia> mymodule_filter(x) = x.mod === @__MODULE__; julia> @test_nodispatch frame_filter=mymodule_filter f(args...) ...
function_filter = @nospecialize(ft)->true
:
A predicate which takes a function type and returnsfalse
to skip analysis on the call.# ignores `Core.Compiler.widenconst` calls (since it's designed to be runtime-dispatched): julia> myfunction_filter(@nospecialize(ft)) = ft !== typeof(Core.Compiler.widenconst) julia> @test_nodispatch function_filter=myfunction_filter f(args...) ...
skip_nonconcrete_calls::Bool = true
:
Julia's runtime dispatch is "powerful" because it can always compile code with concrete runtime arguments so that a "kernel" function runs very effectively even if it's called from a type-instable call site. This means, we (really) often accept that some parts of our code are not inferred statically, and rather we want to just rely on information that is only available at runtime. To model this programming style, dispatch analyzer does NOT report any optimization failures or runtime dispatches detected within non-concrete calls under the default configuration. We can turn off thisskip_nonconcrete_calls
configuration to get type-instabilities within non-concrete calls.# the following examples are adapted from https://docs.julialang.org/en/v1/manual/performance-tips/#kernel-functions julia> function fill_twos!(a) for i = eachindex(a) a[i] = 2 end end; julia> function strange_twos(n) a = Vector{rand(Bool) ? Int64 : Float64}(undef, n) fill_twos!(a) return a end; # by default, only type-instabilities within concrete call (i.e. `strange_twos(3)`) are reported # and those within non-concrete calls (`fill_twos!(a)`) are not reported julia> @report_dispatch strange_twos(3) ═════ 2 possible errors found ═════ ┌ @ REPL[2]:2 %45(Main.undef, n) │ runtime dispatch detected: %45::Type{Vector{_A}} where _A(Main.undef, n::Int64) └───────────── ┌ @ REPL[2]:3 Main.fill_twos!(%46) │ runtime dispatch detected: Main.fill_twos!(%46::Vector) └───────────── Vector (alias for Array{_A, 1} where _A) # we can get reports from non-concrete calls with `skip_nonconcrete_calls=false` julia> @report_dispatch skip_nonconcrete_calls=false strange_twos(3) ═════ 4 possible errors found ═════ ┌ @ REPL[2]:3 Main.fill_twos!(a) │┌ @ REPL[1]:3 Base.setindex!(a, 2, %14) ││ runtime dispatch detected: Base.setindex!(a::Vector, 2, %14::Int64) │└───────────── │┌ @ REPL[1]:3 Base.setindex!(a, 2, i) ││┌ @ array.jl:877 Base.convert(_, x) │││ runtime dispatch detected: Base.convert(_::Any, x::Int64) ││└──────────────── ┌ @ REPL[2]:2 %45(Main.undef, n) │ runtime dispatch detected: %45::Type{Vector{_A}} where _A(Main.undef, n::Int64) └───────────── ┌ @ REPL[2]:3 Main.fill_twos!(%46) │ runtime dispatch detected: Main.fill_twos!(%46::Vector) └───────────── Vector (alias for Array{_A, 1} where _A)
skip_unoptimized_throw_blocks::Bool = true
:
By default, Julia's native compilation pipeline intentionally disables inference (and so succeeding optimizations too) on "throw blocks", which are code blocks that will eventually lead tothrow
calls, in order to ease the compilation latency problem, a.k.a. "first-time-to-plot". Accordingly, the dispatch analyzer also ignores runtime dispatches detected within those blocks since we usually don't mind if code involved with error handling isn't optimized. Ifskip_unoptimized_throw_blocks
is set tofalse
, it doesn't ignore them and will report type instabilities detected within "throw blocks".See also https://github.com/JuliaLang/julia/pull/35982.
# by default, unoptimized "throw blocks" are not analyzed julia> @test_nodispatch sin(10) Test Passed Expression: #= none:1 =# JETTest.@test_nodispatch sin(10) # we can turn on the analysis on unoptimized "throw blocks" with `skip_unoptimized_throw_blocks=false` julia> @test_nodispatch skip_unoptimized_throw_blocks=false sin(10) Dispatch Test Failed at none:1 Expression: #= none:1 =# JETTest.@test_nodispatch skip_unoptimized_throw_blocks = false sin(10) ═════ 1 possible error found ═════ ┌ @ math.jl:1221 Base.Math.sin(xf) │┌ @ special/trig.jl:39 Base.Math.sin_domain_error(x) ││┌ @ special/trig.jl:28 Base.Math.DomainError(x, "sin(x) is only defined for finite x.") │││ runtime dispatch detected: Base.Math.DomainError(x::Float64, "sin(x) is only defined for finite x.") ││└────────────────────── ERROR: There was an error during testing # we can also turns off the heuristic itself julia> @test_nodispatch unoptimize_throw_blocks=false skip_unoptimized_throw_blocks=false sin(10) Test Passed Expression: #= none:1 =# JETTest.@test_nodispatch unoptimize_throw_blocks = false skip_unoptimized_throw_blocks = false sin(10)