Internals of JET.jl
Abstract Interpretation Based Analysis
JET.jl overloads functions with the Core.Compiler.AbstractInterpreter
interface, and customizes its abstract interpretation routine. The overloads are done on JETInterpreter <: AbstractInterpreter
so that typeinf(::JETInterpreter, ::InferenceState)
will do the customized abstract interpretation and collect type errors.
Most overloads use the invoke
reflection, which allows JETInterpreter
to dispatch to the original AbstractInterpreter
's abstract interpretation methods and still keep passing it to the subsequent (maybe overloaded) callees (see JET.@invoke
macro).
Core.Compiler.bail_out_toplevel_call
— Functionbail_out_toplevel_call(interp::JETInterpreter, ...)
An overload for abstract_call_gf_by_type(interp::JETInterpreter, ...)
, which keeps inference on non-concrete call sites in a toplevel frame created by virtual_process
.
Core.Compiler.bail_out_call
— Functionbail_out_call(interp::JETInterpreter, ...)
With this overload, abstract_call_gf_by_type(interp::JETInterpreter, ...)
doesn't bail out inference even after the current return type grows up to Any
and collects as much error points as possible. Of course this slows down inference performance, but hoopefully it stays to be "practical" speed since the number of matching methods are limited beforehand.
Core.Compiler.add_call_backedges!
— Functionadd_call_backedges!(interp::JETInterpreter, ...)
An overload for abstract_call_gf_by_type(interp::JETInterpreter, ...)
, which always add backedges (even if a new method can't refine the return type grew up to Any
). This is because a new method definition always has a potential to change the JET analysis result.
Core.Compiler.const_prop_entry_heuristic
— Functionconst_prop_entry_heuristic(interp::JETInterpreter, @nospecialize(rettype), sv::InferenceState, edgecycle::Bool)
An overload for abstract_call_method_with_const_args(interp::JETInterpreter, ...)
, which forces constant prop' even if the inference result can't be improved anymore, e.g. when rettype
is already Const
; this is because constant prop' can still produce more accurate analysis by throwing away false positive error reports by cutting off the unreachable control flow.
JET.analyze_task_parallel_code!
— Functionanalyze_task_parallel_code!(interp::JETInterpreter, @nospecialize(f), argtypes::Vector{Any}, sv::InferenceState)
Adds special cased analysis pass for task parallelism (xref: https://github.com/aviatesk/JET.jl/issues/114). In Julia's task parallelism implementation, parallel code is represented as closure and it's wrapped in a Task
object. NativeInterpreter
doesn't run type inference nor optimization on the body of those closures when compiling code that creates parallel tasks, but JET will try to run additional analysis pass by recurring into the closures.
JET won't do anything other than doing JET analysis, e.g. won't annotate return type of wrapped code block in order to not confuse the original AbstractInterpreter
routine track https://github.com/JuliaLang/julia/pull/39773 for the changes in native abstract interpretation routine.
JET.is_from_same_frame
— Functionis_from_same_frame(parent_linfo::MethodInstance, current_linfo::MethodInstance) ->
(report::InferenceErrorReport) -> Bool
Returns a function that checks if a given InferenceErrorReport
is generated from current_linfo
. It also checks current_linfo
is a "lineage" of parent_linfo
(i.e. entered from it).
This function is supposed to be used to filter out reports collected from analysis on current_linfo
without using constants when entering into the constant analysis. As such, this function assumes that when a report should be filtered out, the first elment of its virtual stack frame st
is for parent_linfo
and the second element of that is for current_linfo
.
Example: Assume linfo2
will produce a report for some reason.
entry
└─ linfo1
├─ linfo2 (report1: linfo2)
├─ linfo3 (report1: linfo1->linfo2, report2: linfo3->linfo2)
│ └─ linfo2 (report1: linfo1->linfo2, report2: linfo2)
└─ linfo3′ (report1: linfo1->linfo2, ~~report2: linfo1->linfo3->linfo2~~)
In the example analysis above, report2
will be filtered out on re-entering into linfo3′
(i.e. we're analyzing linfo3
with constants argument), because is_from_same_frame(linfo1, linfo3)(report2)
returns true
. Note that report1
is still kept there because of the lineage check, i.e. is_from_same_frame(linfo1, linfo3)(report1)
returns false
.
JET.AbstractGlobal
— Typemutable struct AbstractGlobal
t::Any # analyzed type
iscd::Bool # whether this abstract global variable is declarared as constant or not
end
Wraps a global variable whose type is analyzed by abtract interpretation. AbstractGlobal
object will be actually evaluated into the context module, and a later analysis may refer to its type or alter it on another assignment.
The type of the wrapped global variable will be propagated only when in a toplevel frame, and thus we don't care about the analysis cache invalidation on a refinement of the wrapped global variable, since JET doesn't cache the toplevel frame.
JET.JET_REPORT_CACHE
— ConstantJET_REPORT_CACHE::IdDict{UInt64, IdDict{Core.MethodInstance, Vector{JET.InferenceErrorReportCache}}}
Keeps JET report cache for a MethodInstance
. Reports are cached when JETInterpreter
exits from _typeinf
.
JET.JET_CODE_CACHE
— ConstantJET_CODE_CACHE::IdDict{UInt64, IdDict{Core.MethodInstance, Core.CodeInstance}}
Keeps CodeInstance
cache associated with mi::MethodInstace
that represent the result of an inference on mi
performed by JETInterpreter
. This cache is completely separated from the NativeInterpreter
's global cache, so that JET analysis never interacts with actual code execution.
Top-level Analysis
JET.virtual_process
— Functionvirtual_process(s::AbstractString,
filename::AbstractString,
interp::JETInterpreter,
config::ToplevelConfig,
) -> res::VirtualProcessResult
Simulates Julia's toplevel execution and collects error points, and finally returns res::VirtualProcessResult
res.included_files::Set{String}
: files that have been analyzedres.toplevel_error_reports::Vector{ToplevelErrorReport}
: toplevel errors found during the text parsing or partial (actual) interpretation; these reports are "critical" and should have precedence overinference_error_reports
res.inference_error_reports::Vector{InferenceErrorReport}
: possible error reports found byJETInterpreter
res.toplevel_signatures
: signatures of methods defined within the analyzed filesres.actual2virtual::Pair{Module, Module}
: keeps actual and virtual module
This function first parses s::AbstractString
into toplevelex::Expr
and then iterate the following steps on each code block (blk
) of toplevelex
:
- if
blk
is a:module
expression, recusively enters analysis into an newly defined virtual module lower
sblk
into:thunk
expressionlwr
(macros are also expanded in this step)- if the context module is virtualized, replaces self-references of the original context module with virtualized one: see
fix_self_references
ConcreteInterpreter
partially interprets some statements inlwr
that should not be abstracted away (e.g. a:method
definition); see alsopartially_interpret!
- finally,
JETInterpreter
analyzes the remaining statements by abstract interpretation
In order to process the toplevel code sequentially as Julia runtime does, virtual_process
splits the entire code, and then iterate a simulation process on each code block. With this approach, we can't track the inter-code-block level dependencies, and so a partial interpretation of toplevle definitions will fail if it needs an access to global variables defined in other code blocks that are not interpreted but just abstracted. We can circumvent this issue using JET's concretization_patterns
configuration, which allows us to customize JET's concretization strategy. See ToplevelConfig
for more details.
JET.virtualize_module_context
— Functionvirtualize_module_context(actual::Module)
HACK: Returns a module where the context of actual
is virtualized.
The virtualization will be done by 2 steps below:
- loads the module context of
actual
into a sandbox module, and export the whole context from there - then uses names exported from the sandbox
This way, JET's runtime simulation in the virtual module context will be able to define a name that is already defined in actual
without causing "cannot assign a value to variable ... from module ..." error, etc. It allows JET to virtualize the context of already-existing module other than Main
.
Currently this function relies on Base.names
, and thus it can't restore the using
ed names.
JET.ConcreteInterpreter
— TypeConcreteInterpreter
The trait to inject code into JuliaInterpreter's interpretation process; JET.jl overloads:
JuliaInterpreter.step_expr!
to add error report pass for module usage expressions and support package analysisJuliaInterpreter.evaluate_call_recurse!
to special caseinclude
callsJuliaInterpreter.handle_err
to wrap an error happened during interpretation intoActualErrorWrapped
JET.partially_interpret!
— Functionpartially_interpret!(interp::ConcreteInterpreter, mod::Module, src::CodeInfo)
Partially interprets statements in src
using JuliaInterpreter.jl:
- concretizes "toplevel definitions", i.e.
:method
,:struct_type
,:abstract_type
and:primitive_type
expressions and their dependencies - concretizes user-specified toplevel code (see
ToplevelConfig
) - directly evaluates module usage expressions and report error of invalid module usages (TODO: enter into the loaded module and keep JET analysis)
- special-cases
include
calls so that top-level analysis recursively enters the included file
Error Report Interface
JET.VirtualFrame
— TypeVirtualFrame
Stack information representing virtual execution context:
file::Symbol
: the path to the file containing the virtual execution contextline::Int
: the line number in the file containing the virtual execution contextsig::Vector{Any}
: a signature of this framelinfo::MethodInstance
: TheMethodInstance
containing the execution context
This type is very similar to Base.StackTraces.StackFrame
, but its execution context is collected during abstract interpration, not collected from actual execution.
JET.VirtualStackTrace
— TypeVirtualStackTrace
Represents a virtual stack trace in the form of a vector of VirtualFrame
. The vector holds VirtualFrame
s in order of "from entry call site to error point", i.e. the first element is the VirtualFrame
of the entry call site, and the last element is that contains the error.
JET.InferenceErrorReport
— TypeInferenceErrorReport
An interface type of error reports that JET collects by abstract interpration. If T
implements this interface, the following requirements should be satisfied:
Required fields
T
should have the following fields, which explains where and why this error is reported:vst::VirtualStackTrace
: a virtual stack trace of the errormsg::String
: explains why this error is reportedsig::Vector{Any}
: a signature of the error point
Note that
T
can still have additional fields specific to it.
A constructor interface to create
T
from abstraction interpretationT<:InferenceErrorReport
has the default constructorT(::JETInterpreter, sv::InferenceState, spec_args...)
which works when
T
is reported whensv
's program counter (sv.currpc
) points to that of statement where the error may happen. If soT
just needs to overloadget_msg(::Type{T}, ::JETInterpreter, ::InferenceState, spec_args...) -> msg::String
to provide the message that describes why this error is reported (otherwise the senseless default message will be used).
If
T
is reported whensv
's program counter (sv.currpc
) may not point to the error location or evensv::InferenceState
isn't available,T
can implement its own constructor method.
- A contructor interface to create
T
from the global report cache
In order to be cached and restored fromJET_REPORT_CACHE
,T
must implement the following interfaces:spec_args(::T) -> Tuple{...}
: returns fields that are specific toT
, which is internally used by the caching logicT(vst::VirtualStackTrace, msg::String, sig::Vector{Any} spec_args::Tuple{...}) -> T
: constructor to createT
from the cache, which should expandspec_args
into each specific field
To satisfy these requirements manually will be very tedious. JET internally uses @reportdef
utility macro, which takes the struct
definition of InferenceErrorReport
and automatically defines the struct
itself and the cache interfaces.
See also: VirtualStackTrace
, VirtualFrame
JET.ToplevelErrorReport
— TypeToplevelErrorReport
An interface type of error reports that JET collects while top-level concrete interpration. All ToplevelErrorReport
should have the following fields:
file::String
: the path to the file containing the interpretation contextline::Int
: the line number in the file containing the interpretation context
See also: virtual_process
, ConcreteInterpreter
Utilities
JET.@invoke
— Macro@invoke f(arg::T, ...; kwargs...)
Provides a convenient way to call invoke
; @invoke f(arg1::T1, arg2::T2; kwargs...)
will be expanded into invoke(f, Tuple{T1,T2}, arg1, arg2; kwargs...)
. When an argument's type annotation is omitted, it's specified as Any
argument, e.g. @invoke f(arg1::T, arg2)
will be expanded into invoke(f, Tuple{T,Any}, arg1, arg2)
.
This could be used to call down to NativeInterpreter
's abstract interpretation method of f
while passing JETInterpreter
so that subsequent calls of abstract interpretation functions overloaded against JETInterpreter
can be called from the native method of f
; e.g. calls down to NativeInterpreter
's abstract_call_gf_by_type
method:
@invoke abstract_call_gf_by_type(interp::AbstractInterpreter, f, argtypes::Vector{Any}, atype, sv::InferenceState,
max_methods::Int)
JET.@invokelatest
— Macro@invokelatest f(args...; kwargs...)
Provides a convenient way to call Base.invokelatest
. @invokelatest f(args...; kwargs...)
will simply be expanded into Base.invokelatest(f, args...; kwargs...)
.
JET.@withmixedhash
— Macro@withmixedhash (mutable) struct T
fields ...
end
Defines struct T
while automatically defining its Base.hash(::T, ::UInt)
method which mixes hashes of all of T
's fields (and also corresponding Base.:(==)(::T, ::T)
method).
This macro is supposed to abstract the following kind of pattern:
https://github.com/aviatesk/julia/blob/999973df2850d6b2e0bd4bcf03ef90a14217b63c/base/pkgid.jl#L3-L25
struct PkgId
uuid::Union{UUID,Nothing}
name::String
end
==(a::PkgId, b::PkgId) = a.uuid == b.uuid && a.name == b.name
function hash(pkg::PkgId, h::UInt)
h += 0xc9f248583a0ca36c % UInt
h = hash(pkg.uuid, h)
h = hash(pkg.name, h)
return h
end
with
@withmixedhash
@withmixedhash struct PkgId
uuid::Union{UUID,Nothing}
name::String
end
See also: EGAL_TYPES
JET.@jetconfigurable
— Macro@jetconfigurable function config_func(args...; configurations...)
...
end
This macro asserts that there's no configuration naming conflict across the @jetconfigurable
functions so that a configuration for a @jetconfigurable
function doesn't affect the other @jetconfigurable
functions. This macro also adds a dummy splat keyword arguments (jetconfigs...
) to the function definition so that any configuration of other @jetconfigurable
functions can be passed on to it.