Internals of JET.jl

Abstract Interpretation

In order to perform type-level program analysis, JET.jl uses Base.Compiler.AbstractInterpreter interface, and customizes its abstract interpretation by overloading a subset of Base.Compiler functions, that are originally developed for Julia compiler's type inference and optimizations that aim at generating efficient native code for CPU execution.

JET.AbstractAnalyzer overloads a set of Base.Compiler functions to implement the "core" functionalities of JET's analysis, including inter-procedural error report propagation and caching of the analysis result. And each plugin analyzer (e.g. JET.JETAnalyzer) will overload more Base.Compiler functions so that it can perform its own program analysis on top of the core AbstractAnalyzer infrastructure.

Most overloads use the invoke reflection, which allows AbstractAnalyzer to dispatch to the original AbstractInterpreter's abstract interpretation methods while still passing AbstractAnalyzer to the subsequent (maybe overloaded) callees.

How `AbstractAnalyzer` manages caches

JET.AnalysisResult — Type

AnalysisResult

Container for error reports collected during analysis of a specific InferenceResult.

AbstractAnalyzer manages InferenceErrorReport instances by associating them with their corresponding InferenceResult. Reports found during the analysis of result::InferenceResult can be accessed via get_reports(analyzer, result).

JET.CachedAnalysisResult — Type

CachedAnalysisResult

Cached version of AnalysisResult stored in the global analyzer cache.

When an AnalysisResult is cached into the global cache maintained by AbstractAnalyzer, it's transformed into this type. That is, when codeinf::CodeInstance = Compiler.code_cache(analyzer::AbstractAnalyzer)[mi::MethodInstance], the codeinf.inferred field will contain a CachedAnalysisResult instance.

JET.AnalysisToken — Type

mutable struct AnalysisToken
    AnalysisToken() = new()
end

A unique token object used to identify and separate caches of analysis results.

Each AbstractAnalyzer implementation should use a consistent token to enable proper caching behavior. The identity of the token determines whether cached analysis results can be reused between analyzer instances.

Top-level Analysis

JET.virtual_process — Function

virtual_process(interp::ConcreteInterpreter,
                x::Union{AbstractString,JS.SyntaxNode},
                filename::AbstractString,
                config::ToplevelConfig;
                overrideex::Union{Nothing,Expr}=nothing) -> res::VirtualProcessResult

Simulates Julia's toplevel execution and collects error points, and finally returns VirtualProcessResult.

This function first parses s::AbstractString into toplevelnode::JS.SyntaxNode and then iterate the following steps on each code block (blk) of toplevelnode:

if blk is a :module expression, recursively enters analysis into an newly defined virtual module
lowers blk into :thunk expression lwr (macros are also expanded in this step)
if the context module is virtualized, replaces self-references of the original context module with virtualized one: see fix_self_references
ConcreteInterpreter partially interprets some statements in lwr that should not be abstracted away (e.g. a :method definition); see also partially_interpret!
finally, ToplevelAbstractAnalyzer analyzes the remaining statements by abstract interpretation

Warning

In order to process the toplevel code sequentially as Julia runtime does, virtual_process splits the entire code, and then iterate a simulation process on each code block. With this approach, we can't track the inter-code-block level dependencies, and so a partial interpretation of toplevle definitions will fail if it needs an access to global variables defined in other code blocks that are not interpreted but just abstracted. We can circumvent this issue using JET's concretization_patterns configuration, which allows us to customize JET's concretization strategy. See ToplevelConfig for more details.

JET.VirtualProcessResult — Type

res::VirtualProcessResult

res.analyzed_files::Dict{String,AnalyzedFileInfo}: files that have been analyzed with their corresponding module analyzed_files attached.
res.toplevel_error_reports::Vector{ToplevelErrorReport}: toplevel errors found during the text parsing or partial (actual) interpretation; these reports are "critical" and should have precedence over inference_error_reports
res.inference_error_reports::Vector{InferenceErrorReport}: possible error reports found by ToplevelAbstractAnalyzer
res.toplevel_signatures: signatures of methods defined within the analyzed files
res.actual2virtual::Pair{Module, Module}: keeps actual and virtual module

JET.virtualize_module_context — Function

virtualize_module_context(actual::Module)

HACK to return a module where the context of actual is virtualized.

The virtualization will be done by 2 steps below:

loads the module context of actual into a sandbox module, and export the whole context from there
then uses names exported from the sandbox

This way, JET's runtime simulation in the virtual module context will be able to define a name that is already defined in actual without causing "cannot assign a value to variable ... from module ..." error, etc. It allows JET to virtualize the context of already-existing module other than Main.

TODO

Currently this function relies on Base.names, and thus it can't restore the usinged names.

JET.ConcreteInterpreter — Type

abstract type ConcreteInterpreter <: JuliaInterpreter.Interpreter end

An interface to inject code into JET's virtual process via JuliaInterpreter's interpretation.

Subtypes are expected to implement:

InterpretationState(interp::T) -> InterpretationState - return the interpreter state
ConcreteInterpreter(interp::T, state::InterpretationState) -> T - create new interpreter with state
ToplevelAbstractAnalyzer(interp::T) -> analyzer::ToplevelAbstractAnalyzer - return the analyzer for this interpreter

JET.partially_interpret! — Function

partially_interpret!(interp::ConcreteInterpreter, concretize::BitVector, mod::Module, src::CodeInfo)

Partially interprets statements in src using JuliaInterpreter.jl:

concretizes "toplevel definitions", i.e. :method, :struct_type, :abstract_type and :primitive_type expressions and their dependencies
concretizes user-specified toplevel code (see ToplevelConfig)
directly evaluates module usage expressions and report error of invalid module usages (TODO: enter into the loaded module and keep JET analysis)
special-cases include calls so that top-level analysis recursively enters the included file

Analysis Result

JET.JETToplevelResult — Type

res::JETToplevelResult

Represents the result of JET's analysis on a top-level script.

res.analyzer::AbstractAnalyzer: AbstractAnalyzer used for this analysis
res.res::VirtualProcessResult: VirtualProcessResult collected from this analysis
res.source::AbstractString: the identity key of this analysis
res.jetconfigs: configurations used for this analysis

JETToplevelResult implements show methods for each different frontend. An appropriate show method will be automatically chosen and render the analysis result.

JET.JETCallResult — Type

res::JETCallResult

Represents the result of JET's analysis on a function call.

res.result::InferenceResult: the result of this analysis
res.analyzer::AbstractAnalyzer: AbstractAnalyzer used for this analysis
res.source::AbstractString: the identity key of this analysis
res.jetconfigs: configurations used for this analysis

JETCallResult implements show methods for each different frontend. An appropriate show method will be automatically chosen and render the analysis result.

Splitting and filtering reports

Both JETToplevelResult and JETCallResult can be split into individual failures for integration with tools like Cthulhu:

JET.get_reports — Function

rpts = JET.get_reports(result::JETCallResult)

Split result into a vector of reports, one per issue.

JET.reportkey — Function

reportkey(report::InferenceErrorReport)

Returns an identifier for the runtime-dispatched call site of report.

If you have a long list of reports to analyze, urpts = unique(reportkey, rpts) may remove "duplicates" that arrive at the same runtime dispatch from different entry points.

Error Report Interface

JET.VirtualFrame — Type

VirtualFrame

Stack information representing virtual execution context:

file::Symbol: the path to the file containing the virtual execution context
line::Int: the line number in the file containing the virtual execution context
sig::Signature: a signature of this frame
linfo::MethodInstance: The MethodInstance containing the execution context

This type is very similar to Base.StackTraces.StackFrame, but its execution context is collected during abstract interpration, not collected from actual execution.

JET.VirtualStackTrace — Type

VirtualStackTrace

Represents a virtual stack trace in the form of a vector of VirtualFrame. The vector holds VirtualFrames in order of "from entry call site to error point", i.e. the first element is the VirtualFrame of the entry call site, and the last element is that contains the error.

JET.Signature — Type

Signature

Represents an expression signature. print_signature implements a frontend functionality to show this type.

JET.InferenceErrorReport — Type

abstract type InferenceErrorReport end

An interface type of error reports collected by JET's abstract interpretation based analysis. All InferenceErrorReports have the following fields, which explains where and how this error is reported:

vst::VirtualStackTrace: a virtual stack trace of the error
sig::Signature: a signature of the error point

Note that some InferenceErrorReport may have additional fields other than vst and sig to explain why they are reported.

JET.ToplevelErrorReport — Type

ToplevelErrorReport

An interface type of error reports that JET collects while top-level concrete interpration. All ToplevelErrorReport should have the following fields:

file::String: the path to the file containing the interpretation context
line::Int: the line number in the file containing the interpretation context

See also: virtual_process, ConcreteInterpreter