"Unstable API" Analysis

Motivation

Julia doesn't have any facilities to truly hide module internals. This means, we can always access to whatever defined within a module and use it freely, but some of them may be considered as the module's "internal"s and subject to changes. When possible, we want to avoid their usages for better maintainability in the future. But the problem is, how can we automatically find them already used in an existing code ?

This analysis is motivated by this discussion.

Implementation

Let's define "unstable API" s such that, they're

  • undefined binding, or
  • not exported nor documented, if defined

and now we can implement such analyzer that detects code that matches the definition above using JET.jl's pluggable-analysis framework.

The implementation below is almost sound, under the assumption that the bindings are resolved statically. One thing to note is that, the analysis implements an heuristic to avoid false positives from "language intrinsics", for example, Base.indexed_iterate and Base.Broadcast.broadcasted. They're usually introduced into your code implicitly by Julia's iteration protocols and such, and we're not responsible for their details (thus not interested in their usages). But the problem is that the analyzer below doesn't distinguish those introduced by the language and those written by ourselves, and in the latter case we're certainly uses "unstable API" under the definition above.

using JET
using JET.JETInterface   # to load APIs of the pluggable analysis framework
const CC = Core.Compiler # to inject a customized report pass
Core.Compiler

First off, we define UnstableAPIAnalyzer, which is a new AbstractAnalyzer and will implement the customized report pass

struct UnstableAPIAnalyzer{T} <: AbstractAnalyzer
    state::AnalyzerState
    analysis_cache::AnalysisCache
    is_target_module::T
end
JETInterface.AnalyzerState(analyzer::UnstableAPIAnalyzer) = analyzer.state
JETInterface.AbstractAnalyzer(analyzer::UnstableAPIAnalyzer, state::AnalyzerState) =
    UnstableAPIAnalyzer(state, analyzer.is_target_module)
JETInterface.ReportPass(analyzer::UnstableAPIAnalyzer) = UnstableAPIAnalysisPass()
JETInterface.AnalysisCache(analyzer::UnstableAPIAnalyzer) = analyzer.analysis_cache

const UNSTABLE_API_ANALYZER_CACHE = IdDict{UInt, AnalysisCache}()
IdDict{UInt64, JET.AnalysisCache}()

Next, we overload some of Core.Compiler's abstract interpretation methods, and inject a customized analysis pass (here we gonna name it UnstableAPIAnalysisPass). In this analysis, we are interested in whether a binding that appears in a target code is an "unstable API" or not, and we can simply check if each abstract element appeared during abstract interpretation meets our criteria of "unstable API". For that purpose, it's suffice to overload Core.Compiler.abstract_eval_special_value and Core.Compiler.builtin_tfunction. To inject a report pass, we use ReportPass(::AbstractAnalyzer) interface.

struct UnstableAPIAnalysisPass <: ReportPass end

function CC.abstract_eval_special_value(analyzer::UnstableAPIAnalyzer, @nospecialize(e), vtypes::CC.VarTable, sv::CC.InferenceState)
    if analyzer.is_target_module(sv.mod) # we care only about what we wrote
        ReportPass(analyzer)(UnstableAPI, analyzer, sv, e)
    end

    # recurse into JET's default abstract interpretation routine
    return @invoke CC.abstract_eval_special_value(analyzer::AbstractAnalyzer, e, vtypes::CC.VarTable, sv::CC.InferenceState)
end

function CC.builtin_tfunction(analyzer::UnstableAPIAnalyzer, @nospecialize(f), argtypes::Vector{Any}, sv::CC.InferenceState)
    if f === getfield
        if length(argtypes) ≥ 2
            a1, a2 = argtypes[1:2]
            if isa(a1, Core.Const) && (v1 = a1.val; isa(v1, Module))
                if isa(a2, Core.Const) && (v2 = a2.val; isa(v2, Symbol))
                    if analyzer.is_target_module(sv.mod) || # we care only about what we wrote, but with relaxed filter
                       (parent = sv.parent; isa(parent, CC.InferenceState) && analyzer.is_target_module(parent.mod))
                        ReportPass(analyzer)(UnstableAPI, analyzer, sv, GlobalRef(v1, v2))
                    end
                end
            end
        end
    end

    # recurse into JET's default abstract interpretation routine
    return @invoke CC.builtin_tfunction(analyzer::AbstractAnalyzer, f, argtypes::Vector{Any}, sv::CC.InferenceState)
end

Additionally, we can cut off the performance cost involved with Julia's native compiler's optimizations passes:

CC.may_optimize(analyzer::UnstableAPIAnalyzer) = false

Now we implement the body of our analysis. We define "unstable API"s such that they're:

  1. undefined binding, or
  2. not exported nor documented, if defined

and we're not interested in any other program properties other than whether our code contains "unstable API"s or not.

So in our report pass, we would like to ignore all the reports implemented by JET.jl by default

(::UnstableAPIAnalysisPass)(T::Type{<:InferenceErrorReport}, analyzer, state, @nospecialize(spec_args...)) = return

but except the report of undefined global references (i.e. UndefVarErrorReport). This overload allow us to find code that falls into the category 1.

function (::UnstableAPIAnalysisPass)(T::Type{JET.UndefVarErrorReport}, analyzer, state, @nospecialize(spec_args...))
    JET.BasicPass()(T, analyzer, state, spec_args...) # forward to JET's default report pass
end

And now we will define new InferenceErrorReport report type UnstableAPI, which represents the category 2, and implement a report pass to detect it.

@jetreport struct UnstableAPI <: InferenceErrorReport
    g::GlobalRef
end
function JETInterface.print_report_message(io::IO, (; g)::UnstableAPI)
    (; mod, name) = g
    mod = Base.binding_module(mod, name)
    msg = lazy"usage of unstable API `$mod.$name` found"
    print(io, "usage of unstable API `", mod, '.', name, "` found")
end
JETInterface.report_color(::UnstableAPI) = :yellow

function (::UnstableAPIAnalysisPass)(::Type{UnstableAPI}, analyzer::UnstableAPIAnalyzer, sv, @nospecialize(e))
    if isa(e, GlobalRef)
        (; mod, name) = e
        isdefined(mod, name) || return false # this global reference falls into the category 1, should be caught by `UndefVarErrorReport` instead

        mod = Base.binding_module(mod, name)
        analyzer.is_target_module(mod) && return # we don't care about what we defined ourselves

        if isunstable(mod, name)
            add_new_report!(analyzer, sv.result, UnstableAPI(sv, e))
        end
    end
end

In the report pass above, isunstable will take the heavy lifting to find "unstable API"s. Here we will implement isunstable according to the definition above but with some heuristics to exclude language intrinsics, which can automatically be included into our code and aren't usually of our interest.

function isunstable(mod, name)
    # exclude language intrinsics
    mod === Core && return false
    x = getfield(mod, name)
    x isa Core.Builtin && return false
    (x === Base.indexed_iterate || x === Base.SizeUnknown) && return false # iteration protocol
    (x === Base.Iterators.Filter || x === Base.Iterators.Flatten) && return false # iterator protocol
    x === Base.Broadcast.broadcasted && return false # broadcast protocol
    x === Base.kwerr && return false # ignore keyword lowering

    return !isexported(mod, name) && !hasdoc(mod, name)
end

function isexported(mod, name)
    mod = Base.binding_module(mod, name)
    return Base.isexported(mod, name)
end

# adapted from https://github.com/JunoLab/CodeTools.jl/blob/56e7f0b514a7476864c27523bcf9d4bc04699ce1/src/summaries.jl#L24-L34

using Base.Docs
function hasdoc(mod, name)
    binding = Docs.Binding(mod, name)
    for m in Docs.modules
        meta = Docs.meta(m)
        haskey(meta, binding) && return true
        (; mod, var) = binding
        isdefined(mod, var) && haskey(meta, getfield(mod, var)) && return true
    end
    return false
end
hasdoc (generic function with 1 method)

Usages

Now our analyzer is set up. Lastly we are going to set up analysis entry points using the analyzer.

using InteractiveUtils # to use `gen_call_with_extracted_types_and_kwargs`

# the constructor for creating a new configured `UnstableAPIAnalyzer` instance
function UnstableAPIAnalyzer(world::UInt = Base.get_world_counter();
    is_target_module = ==(@__MODULE__),
    jetconfigs...)
    state = AnalyzerState(world; jetconfigs...)
    # use a globalized code cache (, which is separated by `InferenceParams` configurations)
    cache_key = JET.compute_hash(state.inf_params)
    analysis_cache = get!(AnalysisCache, UNSTABLE_API_ANALYZER_CACHE, cache_key)
    return UnstableAPIAnalyzer(state, analysis_cache, is_target_module)
end
function report_unstable_api(args...; jetconfigs...)
    @nospecialize args jetconfigs
    analyzer = UnstableAPIAnalyzer(; jetconfigs...)
    return analyze_and_report_call!(analyzer, args...; jetconfigs...)
end
macro report_unstable_api(ex0...)
    return InteractiveUtils.gen_call_with_extracted_types_and_kwargs(__module__, :report_unstable_api, ex0)
end
@report_unstable_api (macro with 1 method)

Simple cases

Let's first use the interactive analysis entries and try simple test cases.

UnstableAPIAnalyzer can find an "unstable" function:

function some_reflection_code(@nospecialize(f))
    return any(Base.hasgenerator, methods(f)) # Base.hasgenerator is unstable
end
@report_unstable_api some_reflection_code(sin)
No errors detected

UnstableAPIAnalyzer can find an "unstable" global variable:

module foo; bar = 1 end
report_unstable_api((Any,)) do a
    foo.bar + a # foo.bar is unstable
end
═════ 1 possible error found ═════
(::Main.var"#3#4")(a::Any) @ Main ./find_unstable_api.md:247
│ usage of unstable API `Main.foo.bar` found: Main.foo.bar
└────────────────────

UnstableAPIAnalyzer can detect "unstable API"s even if they're imported binding or nested reference (, which will be resolve to getproperty)

import Base: hasgenerator
report_unstable_api((Any,)) do mi
    # NOTE every function call appearing here is unstable
    ci = hasgenerator(mi) ? Core.Compiler.get_staged(mi) : Base.uncompressed_ast(mi)
end
═════ 2 possible errors found ═════
(::Main.var"#5#6")(mi::Any) @ Main ./find_unstable_api.md:258
│ usage of unstable API `Base.hasgenerator` found: Main.hasgenerator(mi::Any)
└────────────────────
(::Main.var"#5#6")(mi::Any) @ Main ./find_unstable_api.md:258
│ usage of unstable API `Base.uncompressed_ast` found: Base.uncompressed_ast
└────────────────────

Analyze a real-world package

Finally we can use JET's top-level analysis entry points to analyze a whole script or package.

Here we will run UnstableAPIAnalyzer on IRTools.jl, which uses Base.isgenerated, which is renamed to Base.hasgenerator in Julia v1.7 and invoked the discussion at https://github.com/JuliaLang/julia/pull/40745#issuecomment-850876150. Especially, it uses Base.isgenerator here, and you can see the analyzer correctly detects it if you run the following code with IRTools@v0.4.2 installed.

# define an entry point for analyzing a package
function report_package_unstable_api(args...; jetconfigs...)
    analyzer = UnstableAPIAnalyzer(; jetconfigs...)
    return analyze_and_report_package!(analyzer, args...; jetconfigs...)
end

report_package_unstable_api("IRTools";
                            # to only find errors detected within the module context of `IRTools`
                            target_defined_modules=true)
 Warning: IRTools isn't installed in the current environment at /home/runner/work/JET.jl/JET.jl/docs/Project.toml
 @ Main find_unstable_api.md:286
═════ 59 possible errors found ═════
┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:39 Core.kwfunc(IRTools.Inner.invoke_meta)(Core.apply_type(Core.NamedTuple, (:world,))(Core.tuple(world)), IRTools.Inner.invoke_meta, T)
│┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:69 IRTools.Inner.#invoke_meta#6(world, _3, T)
││┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:74 Core.kwfunc(IRTools.Inner.meta)(Core.apply_type(Core.NamedTuple, (:types, :world))(Core.tuple(S, world)), IRTools.Inner.meta, T)
│││┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:38 IRTools.Inner.#meta#1(types, world, _3, T)
││││┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:43 Base._methods_by_ftype
│││││ usage of unstable API `Base._methods_by_ftype` found
││││└─────────────────────────────────────────────────────────────────────────────────
││││┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:49 Base.isgenerated
│││││ usage of unstable API `Base.isgenerated` found
││││└─────────────────────────────────────────────────────────────────────────────────
││││┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:49 Base.uncompressed_ast
│││││ usage of unstable API `Base.uncompressed_ast` found
││││└─────────────────────────────────────────────────────────────────────────────────
││││┌ @ /Users/aviatesk/.julia/packages/IRTools/aSVI5/src/reflection/reflection.jl:54
... # many other "unstable API"s detected

This page was generated using Literate.jl.