Author Archives: Julia on μβ

SCIP plugins and the cut selection interface

By: Julia on μβ

Re-posted from: https://matbesancon.xyz/post/2022-10-03-cutselection/

This is a short post on the cut selection mechanism in the mixed-integer optimization solver
SCIP and things I used for its implementation in the SCIP.jl Julia wrapper.
You can check out the corresponding pull request for completeness.

Table of Contents

Callbacks?

The space of mixed-integer optimization solvers is mostly divided between
commercial, closed-source solvers and academic solvers open in source code.
In the second cluster, SCIP stands out for the tunability of the solving
process, like all solvers through some parameters but more importantly through callbacks.

Callbacks are functions that are passed to a solver (or another function more generally) by the user
with an expected behavior.
Conceptually, they are the most elementary building block for Inversion of Control, letting the user
define part of the behaviour of the solver through their own code and not only through fixed parameters.

A basic callback system implemented in many solvers is a printing or logging callback,
the user function is called at every iteration of a solving process with some iteration-specific information to print or log,
here is a Julia example with gradient descent:

function my_solver(x0::AbstractVector{T}, gradient_function::Function, callback::Function)
    x = x0
    while !terminated
        g = gradient_function(x)
        stepsize = compute_stepsize(x)
        callback(x, g, stepsize)
        x = x - gamma * g
        terminated = ...
    end
    return x
end

In this example, the callback is not expected to modify the solving process but contains all the information
about the current state and can record or print data.

The C version of it would be something like:

#include <stdbool.h>

// defining the function types
typedef void (*Gradient)(double* gradient , double* x);
typedef void (*Callback)(double* gradient , double* x, double stepsize);

void my_solver(double* x, Gradient gradient_function, Callback callback) {
    double* gradient = initialize_gradient(x);
    double stepsize;
    bool terminated = false;
    while (!terminated) {
        gradient_function(gradient, x);
        stepsize = compute_stepsize(gradient, x);
        callback(x, gradient, stepsize);
        update_iterate(x, gradient, stepsize);
        terminated = ...;
    }
}

SCIP plugins

SCIP plugins are generic interfaces for certain components of the solver such as cutting plane generators
(also called separators), heuristics, lazy constraints.
Think of plugins as a bundle of functions that have a grouped logic. Compared to callbacks,
they are another level in Inversion of Control often referred to as Dependency Injection.
Since C does not have a native mechanism for such a concept (think C++ abstract classes, Haskell data classes, Rust traits, Java interfaces, Scala traits),
the SCIP developers just cooked up their own with macros for the sugar of an interface.

SCIP plugins are listed on the page for how to add them.

Cut selection

A cut is a linear inequality $\alpha^T x \leq \beta$ such that:

  1. at least one optimal solution remains feasible with that cut (in general, cuts will not remove optimal solutions),
  2. a part of the feasible region of the convex relaxation is cut off (otherwise, the cut is trivial and useless).

In SCIP 8, a cut selector plugin was added, see the description in the SCIP 8 release report.
It was originally motivated by this paper including a subset of the SCIP 8 authors
on adaptive cut selection, showing that a fixed selection rule could perform poorly.

There is ongoing research on cut selection at ZIB and other places, having seen that smarter rules do make a difference.

The selection problem can be stated as follows: given a set of previously generated cuts (some might be locally valid at the current node only),
which ones should be added to the linear relaxation before continuing the branching process?

Instinctively, a cut should be added only if it improves the current relaxation. If the current linear programming relaxation solution
is not cut off by a cut, that cut is probably not relevant at the moment, even though it might cut off another part of the polytope.
Example of criteria currently used to determine whether a cut should be added are:

  • efficacy: how far is the current LP relaxation from the new hyperplane,
  • sparsity: how many non-zeros coefficients does the cut have
  • orthogonality (to other constraints), a cut that is parallel to another cut means that one of them is redundant.

Instead of trying to come up with fixed metrics and a fixed rule, the cut selector allows users to define their own rule
by examining all cuts and the current state of the solver.

Cut selector interface

I will focus here on the Julia interface, some parts are very similar to what would be implemented
by a C or C++ user, except for memory management that is done automatically here.

The cut selector interface is pretty simple, it consists on the Julia side of

  • a structure that needs to be a subtype of AbstractCutSelector,
  • one key function that has to be implemented.

The low-level cut selection function that SCIP expects has the following signature,
I will give the Julia version but the C one is strictly identical:

function select_cut_lowlevel(
    scip::Ptr{SCIP},
    cutsel_::Ptr{SCIP_CUTSEL},
    cuts_::Ptr{Ptr{SCIP_ROW}},
    ncuts::Cint,
    forced_cuts_::Ptr{Ptr{SCIP_ROW}},
    nforced_cuts::Cint,
    root_::SCIP_Bool,
    maxnslectedcuts::Cint,
    nselectedcuts_::Ptr{Cint},
    result_::Ptr{SCIP_RESULT}
)::SCIP_RETCODE

The function takes a pointer to the SCIP model, the pointer to our cut selection plugin that
is stored within SCIP, a vector of cuts (passed as a pointer and a length),
a vector of forced cuts, that is, cuts that will be added to the linear relaxation independently of the
cut selection procedure, whether we are at the root node of the branch-and-bound tree and what is the maximum number of cuts
we are allowed to accept.

Forced cuts are interesting to have because they let us avoid adding redundant cuts.
This function is expected to sort the array of cuts by putting the selected cuts first
and updating the value of nselectedcuts_ and result_.

This interface is quite low-level from a Julia perspective, and passing all arguments C-style is cumbersome.
The SCIP.jl wrapper thus lets users define their selector with a single function to implement:

function select_cuts(
    cutsel::AbstractCutSelector,
    scip::Ptr{SCIP_},
    cuts::Vector{Ptr{SCIP_ROW}},
    forced_cuts::Vector{Ptr{SCIP_ROW}},
    root::Bool,
    maxnslectedcuts::Integer,
    )
end

This function returns the output values in a tuple (retcode, nselectedcuts, result)
instead of passing them as arguments and lets the user manipulate vectors instead of raw pointers.
The raw function can be passed to C, but the user only see the idiomatic Julia one.
On each of the Ptr{SCIP_ROW}, the user can call any of the C functions, all SCIP C functions are available in
the SCIP.LibSCIP submodule. They can compute for instance parallelism between rows, get the number of non-zeros,
or get the coefficients $\alpha$, left and right-hand side (rows are two-sided in SCIP) and compute quantities of interest themselves.

Here is the complete example for a cut selector that never selects any cut:

# the struct needs to be mutable here
mutable struct PickySelector <: SCIP.AbstractCutSelector
end

function SCIP.select_cuts(
        cutsel::PickySelector, scip, cuts::Vector{Ptr{SCIP_ROW}},
        forced_cuts::Vector{Ptr{SCIP_ROW}}, root::Bool, maxnslectedcuts::Integer,
    )
    # return code, number of cuts, status
    return (SCIP.SCIP_OKAY, 0, SCIP.SCIP_SUCCESS)
end

We have now defined a cut selector that implements the interface but SCIP does not know about it yet.
In the Julia interface, we added a wrapper function that takes care of the plumbing parts:

cutselector = PickySelector()
o = SCIP.Optimizer()
SCIP.include_cutsel(o, cutselector)

Some C-Julia magic

The simplicity of the interface is enabled by some nice-to-have features.

@cfunction lets us take a Julia function that is compatible with C, that is,
it can accept arguments that are compatible with the C type system, and produces a function pointer for it.
In our case, a function pointer is precisely what we need to pass to SCIP.
But to create a C function pointer, we need the full concrete type declared ahead of time,
@cfunction thus takes the return type and a tuple of the argument types to create the pointer:

func_pointer = @cfunction(
    select_cut_lowlevel,
    SCIP_RETCODE,
    (
        Ptr{SCIP_}, Ptr{SCIP_CUTSEL},
        Ptr{Ptr{SCIP_ROW}}, Cint, Ptr{Ptr{SCIP_ROW}},
        Cint, SCIP_Bool, Cint, Ptr{Cint}, Ptr{SCIP_RESULT}
    ),
)

The other nice-to-have feature here is wrapping a Julia Vector around a raw data pointer without copying data,
remember that in the low-level interface, cuts are passed as a pointer and a number of elements
(cuts::Ptr{Ptr{SCIP_ROW}}, ncuts::Cint).
We can wrap a Vector around it directly:

cut_vector = unsafe_wrap(Vector, cuts, ncuts)

A very useful use case for this is shown in the test, one can get the cut vector, and then sort them in-place
with a custom criterion:

sort!(cut_vector, by=my_selection_criterion)

This will sort the elements in-place, thus modifying the array passed as a double pointer.

Pruning the expression tree with recursive value identification

By: Julia on μβ

Re-posted from: https://matbesancon.xyz/post/2022-04-29-expression-trees/

Today was the release of SCIP.jl v0.11, the first release switching to SCIP 8.
The major change in this (massive) release was the rewrite of the nonlinear optimization part, using a so-called expression framework.
The rewrite of the wrapper had some fairly tedious parts, debugging C shared libraries is quickly a mess with cryptic error messages.
But the nonlinear rewrite gave me the opportunity to tweak the way Julia expressions are passed to SCIP in a minor way.

Table of Contents

SCIP expressions

I will not go in depth into the new expression framework and will instead reference these slides
but more importantly the SCIP 8 release report

The key part is that in a nonlinear expression, each operand is defined as an expression handler, and new ones can be introduced by users.
Several specialized constraint types or constraint handlers in SCIP terminology were also removed, using the expression framework with
a generic nonlinear constraint instead.

The Julia wrapper initial framework

As a Lisp-inspired language, (some would even a Lisp dialect),
Julia is a homoiconic language: valid Julia code can always be represented and stored in a primitive data structure.
In this case, the tree-like structure is Expr with fields head and args:

julia> expr = :(3 + 1/x)
:(3 + 1 / x)

julia> expr.head
:call

julia> expr.args
3-element Vector{Any}:
  :+
 3
  :(1 / x)

The SCIP.jl wrapper recursively destructures the Julia expression and builds up corresponding SCIP
expressions, a SCIP data structure defined either as a leaf (a simple value or a variable)
or as an operand and a number of subexpressions.
This is done through a push_expr! function which either:

  • Creates and returns a single variable expression if the expression is a variable
  • Creates and returns a single value expression if the expression is a constant
  • If the expression is a function f(arg1, arg2...), calls push_expr! on all arguments, and then creates and returns the SCIP expression corresponding to f.

One part remains problematic, imagine an expression like 3 * exp(x) + 0.5 * f(4.3), where f
is not a primitive supported by SCIP. It should not have to be indeed, because that part of the expression
could be evaluated at expression compile-time. But if one is walking down the expression sub-parts,
there was no way to know that a given part is a pure value, the expression-constructing procedure would
first create a SCIP expression for 4.3 and then try to find a function for f to apply with this expression
pointer as argument. This was the use case initially reported in this issue
at a time when SCIP did not support trigonometric functions yet.

Another motivation for solving this issue is on the computational and memory burden.
Imagine your expression is now 3 * exp(x) + 0.1 * cos(0.1) + 0.2 * cos(0.2) + ... + 100.0 * cos(100.0).
This will require producing 2 * 1000 expressions for a constant, declared, allocated and passed down to SCIP.
The solver will then likely preprocess all constant expressions to reduce them down, so it ends up being a lot of
work done on one end to undo immediately on the other.

A lazified expression declaration

Make push_expr! return two values (scip_expr, pure_value), with the second being a Boolean for whether the expression is a pure value or not.
At any leaf computing f(arg1, arg2...).

If the expression of all arguments are pure_value, do not compute the expression and just return a null pointer, pure_value is true for this expression.

If at least one of the arguments is not a pure_value, we need to compute the actual expression. None of the pure_value arguments were declared as SCIP expressions yet, we create a leaf value expression for them with Meta.eval(arg_i). The non-pure value arguments already have a correct corresponding SCIP expression pointer. pure_value is false for this expression.

Note here that we are traversing some sub-expressions twice, once when walking down the tree and once more hidden with Meta.eval(arg_i) which computes the value for said expression, where we delegate the expression value computation to Julia. An alternative would be to return a triplet from every push_expr! call (expr_pointer, pure_value, val) and evaluate at
each pure_value node the value of f(args...), with the value of the arguments already computed. This would however complexity the code in the wrapper with no advantage of the runtime,
the expression evaluation is not a bottleneck for expressions that can realistically be tackled by a global optimization solver like SCIP.

Mutability, scope, and separation of concerns in library code

By: Julia on μβ

Re-posted from: https://matbesancon.xyz/post/2021-12-11-mutability-library/

It has been about a year since I joined the Zuse Institute
to work on optimization methods and computation.
One of the key projects of the first half of 2021 has been on building up
FrankWolfe.jl,
a framework for nonlinear optimization in Julia using Frank-Wolfe
methods. You can find a paper introducing the package here.
This was an opportunity to experiment with different design choices
for efficient, scalable, and flexible optimization tools
while keeping the code simple to read and close to the algorithms.

I will list down a few roads we went on, experimenting what reads and works best to achieve these goals.

Table of Contents

No mutability

Probably the simplest pattern to follow. It is also a good one when
the created objects are light, ideally stack-allocated.

This is typically the code you would write to get the leanest version
of the Frank-Wolfe algorithm:

x = initialize(feasible_set)
while !terminated
    direction = grad(x)
    v = compute_extreme_point(feasible_set, direction)
    γ = find_stepsize(stepsize_strategy, x, v)
    x = x + γ * (v - x)
    terminated = check_termination(x)
end

This program is highly generic, users can define their own
grad function, and typically implement
compute_extreme_point and find_stepsize methods for their custom
feasible set and step size strategy types.
If you push it further, you can use a custom abstract vector type
for x and v.
Not a vector in the programming sense, you can use weird vector spaces
as long as addition and scaling are defined.

What would be a problem then? If you have seen high-performance code
before, you are probably screaming at the allocations happening all over the place. Every line is allocating a new object in memory,
first this direction, then the extreme point v.
The worst might be the x update step which allocates three vectors
because of intermediate expressions.
If you come from another performance-enabling programming environment
(Fortran, C, C++, Rust), what I am saying is probably obvious.
If you come from interpreted languages like Python or R, you may wonder why bothering about these? If you do not need performance, indeed maybe you shouldn’t bother but when developing a library, users will probably expect not
to have to rewrite your code for a larger-scale use case.
Also, these interpreted languages are typically slow across the board
when performing operations in the language itself and not moving them
to external kernels written in a compiled language (or being lucky with Numba).
In Julia, operations will typically be as fast as they can get if you pay
attention to minor things, so the bottleneck quickly becomes
the allocations of large objects.
The other thing people may oppose is that it is the role of the compiler
to take high-level expressions and reformulate them to avoid allocations.
This is a common argument among some functional programming circles,
everything is immutable because the compiler will figure everything out.
To some extent, this is true of course but pushing too much program
transformation to the compiler introduces some complexity
on all users, not just the ones focusing on performance.
You may typically get bitten by iterators methods (filter, map)
in Rust yielding a result of a custom type which changes if
a type annotation is given first.
Without this type annotation, when expecting a consistent type
to be inferred, one can get an error complaining about a weird
type generated by the chaining of all operations.
Finally, pushing this on the compiler means that you expect it to optimize
your code consistently and always in the way you would do it, because in
most cases “overriding” the compiler behaviour is far from trivial
and even verifying the decisions the compiler took will require inspecting
lowered emitted code (down to LLVM IR, assembly).

Finally, worrying about performance of the inner loop is also a consequence
of the nature of the algorithm itself: Frank-Wolfe, as typical for
first-order methods, will perform a lot of iterations that are relatively
cheap, as opposed to, say Interior Point Methods which will typically
converge in few iterations but with each one of them doing significant
work. In the latter case, allocating a few vectors might be fine because
linear algebra will dominate runtime, but not in FW where each
individual operation is relatively cheap compared to allocations.

Passing containers

This would be the typical signature of C functions, receiving almost all
heap-allocated containers as arguments.
A typical example would be replacing the gradient computation with:

grad!(storage, x)

which would compute the gradient at x in-place in the storage container.
Note the ! which is just a Julia idiom to indicate a function that mutates
one of its arguments. Adding storage arguments to function calls is also
used in Optim.jl
for the definition of a gradient or Hessian or in
DifferentialEquations.jl to pass the function describing a dynamic.

This has the strong advantage of making it possible for users to reduce
their memory consumption and runtime. This also means that composing calls
can be made performant: typically, library developers pay attention
to hot loops which they spend more time optimizing. But what if your main
top-level algorithm is someone else’s hot loop? Then they need to be able
to control that setup cost in some way.

Why not use these additional arguments and in-place notation everywhere then?

Consider the same gradient with multiple intermediate expressions that must be held in different structures, where should one hold the storage?
Adding more storage:

grad!(storage1, storage2, storage3, x)

means users would need to implement one with a very large number of arguments
all the time which they wouldn’t use.
Remember we cannot just bundle all the storage containers into one
because the “main” one is supposed to then contain the actual gradient
value at x.
Alternatively, all additional storage elements could be put as keyword arguments, but it also quickly makes for obscure signatures.

Dedicated workspace

This is the approach taken in Nonconvex.jl, all temporary containers required by an algorithm are defined
as a type specific to that algorithm:

struct MyAlgorithm end

mutable struct MyAlgorithmWorkspace
    v::Vector{Float64}
end

prepare_workspace(::MyAlgorithm, x) = MyAlgorithmWorkspace(similar(x))

function run_algorithm(alg, x0; workspace = prepare_workspace(alg, x0))
    compute_step!(workspace.v, x0)
end

This pattern avoids the monstrous signature with 10+ arguments
that are not “real” inputs of the function in a mathematical sense,
lets a good default for most users of letting the keyword be initialized
but allows more advanced users to pass down the workspace if required.
The workspace of a given algorithm can also contain multiple arguments
of different types without requiring changes to the other algorithms.
This is exactly the path experimented for the step size computation
in FrankWolfe.jl#259.

Functors

Sadly, the workspace pattern is not a silver bullet, even if it covers a lot of cases.
When one needs not only some internal workspace, but also returning a large object?

The Frank-Wolfe is also composed of different building blocks, gradient computation,
linear oracle, step size. Should we have a dedicated workspace for each of them?
That would also be a load we place on all advanced users defining a new component;
acceptable for the oracles which typically have a dedicated type, but it quickly becomes
awkward for something like the objective function.
Would we expect users to do something like the following:

using LinearAlgebra
f(x, workspace) = dot(x, workspace.A, x)
struct FWorkspace{T}
    A::Matrix{T}
end
build_objective_workspace(::typeof(f), x) = ...

It reads oddly and will be hard to explain to users relatively new to Julia
while not being a very advanced feature for the package, defining an objective function is
part of the workflow.

A typical pattern would be to use closures for such parameters:

function build_objective(A)
    f(x) = dot(x, A, x)
    function grad!(storage, x)
        storage .= 2 * A * x
    end
    return (f, grad!)
end

(f, grad!) = build_objective(A)
# ...

The two functions close over a common parameter A which can be accessed from within it.
But what if you need to access A outside build_objective, once the functions are created?

You actually can do f.A, but it’s definitely a hack using an implementation
detail more than a feature, do not reproduce this at home!
And users or library contributors might also be confused when trying to see where the f.A
accessor is defined.

Instead, we should transparently define the fields we access and use functors or callable structures.

struct GradientWithMatrix{T} <: Function
    A::Matrix{T}
end

# defines the behaviour of a call to GradientWithMatrix
function (gm::GradientWithMatrix)(storage, x)
    storage .= 2 * gm.A * x
end

grad! = GradientWithMatrix(ones(3,3))
# ...

grad! is named like our previous function and can be called in the same way,
but it is also a struct of type GradientWithMatrix{Float64}.
Furthermore, this parameterized gradient type can be set up by the user, so we,
the package developers, let the user implement an allocation-free version by setting up their gradient only
once.

This pattern could also become handy for dynamic parameters evolving with iterations,
like a number of gradient calls:

mutable struct GradientWithIterationCounter{T} <: Function
    A::Matrix{T}
    counter::Int
end

GradientWithIterationCounter(A) = GradientWithIterationCounter(A, 0)

# a call to GradientWithMatrix increases the counter and updates the storage
function (gm::GradientWithMatrix)(storage, x)
    gm.counter += 1
    storage .= 2 * gm.A * x
end

grad! = GradientWithMatrix(ones(3,3))
# ...

This allows us, in a non-intrusive way for the algorithm code, to add an iteration tracking feature
to Frank-Wolfe.

Thanks Wikunia for proofreading and feedback!