Inference Convergence algorithm in Julia – revisited

Adventures in Type Inference Convergence: 2017 edition

Corrected Convergence

In my last post on type inference convergence,
I described a correct inference algorithm.
However, while this was a tremendous improvement over being wrong, I was unhappy with it.
Indeed, I wrote that “a full discussion of these heuristics will have to wait for a future blog post”.
But what I didn’t say is that I had already written many notes for that post,
and had reached a point where I understood that the current algorithm wouldn’t actually permit
reliable, well-tuned heuristics.

Plus, the algorithm required lots of global state (to manage the work-queue),
which required tricky locks and coordination to ensure it stayed consistent.

Convergence Algorithm 2.0

The improved convergence algorithm in PR #21677
maintains all of the correctness guarantees of the existing algorithm,
but uses a completely revised cycle-detection algorithm that provides much stronger guarantees
about the order in which work will be done.

By comparison to be current algorithm, the revised one has half as many states.
They are:

  1. processing: any leaf function of the call-stack (usually there’s just one at a time)
  2. on the call-stack: the tree DAG (directed-acyclic-graph) of functions with edges representing inferred potential calls
  3. in a cycle: groups of functions nodes discovered not to form a DAG (represented instead as an unsorted set)
  4. finished: done

The outline of the algorithm that manages these states is that the call-stack is always maintained as a simple tree,
only permitting the existence of nodes with forward edges.
During the course of running inference, if adding an edge would cause a cycle in this graph,
the algorithm instead replaces all nodes that are participating in the cycle
with a single node that represents that cyclic set.

Conceptually, this set is similar to either the “fixed-point” set in the current algorithm,
or the “active lines” set.
It differs from the “fixed-point” set because it can also guarantee that only nodes in the cycle
will be in the set.
The “active lines” set only allows describing the active set within a single method –
the new representation expands that to iterate convergence of an entire set of methods.

Because the algorithm maintains a acyclic call-stack,
it becomes now possible to depend on the inspection of an edge in inference to
result in exactly one of two results: an inferred return type,
or a cycle detection that replaces the current node with a convergence set.
This greatly increases the options available for integrating new inference heuristics!

There’s one additional nice benefit to this new representation:
improved inlining heuristics!
It may not be obvious how improved cycle-detection during inference convergence
would impact the optimization afterwards,
but one of the unintuitive products of inference is the inlining order.
Deciding on the profitability of inlining is dependent upon knowing the
precise structure (ordering) of the call-stack and any cycles in it.
Since the new inference structure provides precise information on which
functions potentially participate in cycles,
the inliner order can take advantage of that information to decide which functions to inline,
and where to terminate the inlining due to recursion.

Undecidability Heuristics

Without heuristics imposed on top of the existing inference algorithm,
it would already be Turing-complete[^turing],
with just the current support for recursively inspecting dispatch over computed types.

This means that (without some additional constraints added) inference might attempt to
compute anything (including the halting problem, busy beaver, infinite recursion).
In some cases, code is written that explicitly tries to make use of this
using “lisp-y style” function application.
However, code written in that manner often quickly runs into limits
that are inherently required by the compiler to avoid other very similar
code patterns where the inferred recursion is either unintentional
(it won’t affect runtime performance / behavior)
or unreachable (solving for reachability is commonly referred to as the halting problem).

It is suboptimal for this inference system to be undecidable (or even slow),
since its sole purpose is to find optimizations to make your program run faster.
And inference is going to be painfully slow when it is being used as an interpreter
(rather than just running the program unspecialized).
So, there needs to be some mechanism or heuristics for preventing this from happening.

It’s also important to realize that these heuristics need to be tuned precisely
to the capabilities of inference.

Turing-completeness requires several features:

  • loops
  • conditionals
  • arbitrarily large (but finite) memory

Removing any one of these would be sufficient to ensure the system is computable.
But loops (recursion), conditionals (dispatch), and memory (apply-type)
are all essential to getting useful results from inference
and thus it is generally not desirable to simply remove them.
Additionally, even if it was possible now,
as inference becomes more capable
(e.g. inter-procedural constant propagation, effect-free computation,
speculative evaluation of constant arithmetic),
new forms of loops and memory would appear that would also need to be addressed.

Current limits

Since Julia’s inference algorithm currently only operates on Julia’s Type’s,
(and does not support inference of constant expressions like 1 + 1),
we only need to consider how these can be used to express an arbitrarily large memory.
Imply both that it can form arbitrarily large constants, and do arithmetic (conditional loops) on them.

There are a finite number of type names.
But there are two primary mechanisms by which a type can be used to express an arbitrary value.
For any type, the parameters can be arbitrarily nested in depth.
Additionally, there are tuples and unions, which (in addition to depth), can have arbitrary length.
We can express this complexity as:

depth(type) = 1 + maximum(depth, type.parameters)
length(type) = length(type.parameters)

It is possible, using either property (or a combination of both), to represent any number.

The current heuristics in inference prohibit recursion which increases either depth or length.
without this ability, the program cannot express any number larger than the input,
forcing it to (eventually) terminate.
Additionally, there are various (semi-arbitrary) limits places on the lengths and depths of
the types that are flowing through the system.

Independence of the cached result

For predictability, it is preferable that the heuristic limits imposed should only be local properties,
computable from the function itself or its forward edges (the functions that it calls).
This remains an unsolved problem, since the heuristics are based on examining the call-stack (following back-edges).
This means that the order in which functions enter inference can affect how precisely they are permitted to compute their results.

heuristics

(A diagram showing how the context-sensativity of the specialization type signature can create repeating patterns instead of simple recursion)

To illustrate what I mean by wanting to only use “local properties”, consider if fuchsia entered inference first,
then inference will see the widening recursion at fuchsia´.
And green´ will not be encountered.
Thus, the result of green will be different than if fuchsia was not being inferred
(recursion would have been detected at green´, changing the results assigned and cached for both –
including fuchsia when it is later inferred, and uses the cached return type for green).

An optimal limit would notice the recursion at fuchsia´, but change the topmost function (fuchsia)
to use an approximation of green while canceling type-inference on the original attempted edges.
This would preserve the characteristic that the cached result of type-inferring green shouldn’t
depend on having been called from fuschia.

Looking forward

A downside of the current arbitrary limits is that they penalize valid code which uses complex types,
even if they are not changing under recursion.

Another downside is that it only takes into account the possibility of expressing memory
via the type-system “length” and “depth” definitions above.

These challenges remain, and will need to be addressed in a future update to this blog
(and to the implementation!).