Building our own graph type in Julia

By: Julia on μβ

Re-posted from: https://matbesancon.github.io/post/2018-08-17-abstract_graph/


This is an adapted post on the talk we gave with James
at JuliaCon 2018 in London. You can see the
original slides,
the video still requires a bit of post-processing.

Last week JuliaCon in London was a great and very condensed experience.
The two talks on LightGraphs.jl
received a lot of positive feedback and more than that, we saw
how people are using the library for a variety of use cases which is a great
signal for the work on the JuliaGraphs ecosystem
(see the lightning talk).

I wanted to re-build the same graph for people who prefer a post version to
my clumsy live explanations on a laptop not handling dual-screen well
(those who prefer the latter are invited to see the live-stream of the talk).

Why abstractions?

The LightGraphs library is built to contain as few elements as possible to get
anyone going with graphs. This includes:

  • The interface a graph type has to comply with to be used
  • Essential algorithms implemented by any graph respecting that interface
  • A simple, battery-included implementation based on adjacency lists

The thing is, if you design an abstraction which in fact has just one
implementation, you’re doing abstraction wrong. This talks was also a
reality-check for LightGraphs, are we as composable, extensible as we promised?

The reason for abstraction is also that minimalism has its price.
The package was designed as the least amount of complexity required to get
graphs working. When people started to use it, obviously they needed more
features, some of which they could code themselves, some other required
extensions built within LightGraphs. By getting the core abstractions right,
you guarantee people will be able to use it and to build on top with minimal
friction, while keeping it simple to read and contribute to.

Our matrix graph type

Let’s recall that a graph is a collection of nodes and a collection of
edges between these nodes. To keep it simple, for a graph of $n$ edges,
we will consider they are numbered from 1 to n. An edge connects a node $i$
to a node $j$, therefore all the information of a graph can be kept as an
adjacency matrix $M_{ij}$ of size $n \times n$:

$$M_{ij} = \begin{cases} 1, & \mbox{if edge (i $\rightarrow$ j) exists} \\ 0 & \mbox{otherwise}\end{cases}$$

We don’t know what the use cases for our type will be, and therefore,
we will parametrize the graph type over the matrix type:

import LightGraphs; const lg = LightGraphs
mutable struct MatrixDiGraph{MT <: AbstractMatrix{Bool}} <: lg.AbstractGraph{Int}
  matrix::MT
end

The edges are simply mapping an entry (i,j) to a boolean (whether there is an
edge from i to j). Even though creating a graph type that can be directed
or undirected depending on the situation is possible, we are creating a type
that will be directed by default.

Implementing the core interface

We can now implement the core LightGraphs interface for this type, starting
with methods defined over the type itself, of the form function(g::MyType)

I’m not going to re-define each function here, their meaning can be found
by checking the help in a Julia REPL: ?LightGraphs.vertices or on the
documentation page.

lg.is_directed(::MatrixDiGraph) = true
lg.edgetype(::MatrixDiGraph) = lg.SimpleGraphs.SimpleEdge{Int}
lg.ne(g::MatrixDiGraph) = sum(g.m)
lg.nv(g::MatrixDiGraph) = size(g.m)[1]

lg.vertices(g::MatrixDiGraph) = 1:nv(g)

function lg.edges(g::MatrixDiGraph)
    n = lg.nv(g)
    return (lg.SimpleGraphs.SimpleEdge(i,j) for i in 1:n for j in 1:n if g.m[i,j])
end

Note the last function edges, for which the documentation specifies that we
need to return an iterator over edges. We don’t need to collect the comprehension
in a Vector, returning a lazy generator.

Some operations have to be defined on both the graph and a node, of the form
function(g::MyType, node).

lg.outneighbors(g::MatrixDiGraph, node) = [v for v in 1:lg.nv(g) if g.m[node, v]]
lg.inneighbors(g::MatrixDiGraph, node) = [v for v in 1:lg.nv(g) if g.m[v, node]]
lg.has_vertex(g::MatrixDiGraph, v::Integer) = v <= lg.nv(g) && v > 0

Out MatrixDiGraph type is pretty straight-forward to work with and all
required methods are easy to relate to the way information is stored in the
adjacency matrix.

The last step is implementing methods on both the graph and an edge of the
form function(g::MatrixDiGraph,e). The only one we need here is:

lg.has_edge(g::MatrixDiGraph,i,j) = g.m[i,j]

Optional mutability

Mutating methods were removed from the core interace some time ago,
as they are not required to describe a graph-like behavior.
The general behavior for operations mutating a graph is to return whether
the operation succeded. They consist in adding or removing elements from
either the edges or nodes.

import LightGraphs: rem_edge!, rem_vertex!, add_edge!, add_vertex!

function add_edge!(g::MatrixDiGraph, e)
    has_edge(g,e) && return false
    n = nv(g)
    (src(e) > n || dst(e) > n) && return false
    g.m[src(e),dst(e)] = true
end

function rem_edge!(g::MatrixDiGraph,e)
    has_edge(g,e) || return false
    n = nv(g)
    (src(e) > n || dst(e) > n) && return false
    g.m[src(e),dst(e)] = false
    return true
end

function add_vertex!(g::MatrixDiGraph)
    n = nv(g)
    m = zeros(Bool,n+1,n+1)
    m[1:n,1:n] .= g.m
    g.m = m
    return true
end

Testing our graph type on real data

We will use the graph type to compute the PageRank of

import SNAPDatasets
data = SNAPDatasets.loadsnap(:ego_twitter_d)
twitter_graph = MatrixDiGraph(lg.adjacency_matrix(data)[1:10,1:10].==1);
ranks = lg.pagerank(twitter_graph)

Note the broadcast check .==1, adjacency_matrix is specified to yield a
matrix of Int, so we use this to cast the entries to boolean values.

I took only the first 10 nodes of the graph, but feel free to do the same with
500, 1000 or more nodes, depending on what your machine can stand 🙈

Overloading non-mandatory functions

Some methods are already implemented for free by implementing the core interface.
That does not mean it should be kept as-is in every case. Depending on your
graph type, some functions might have smarter implementations, let’s see one
example. What MatrixDiGraph is already an adjacency_matrix, so we know
there should be no computation required to return it (it’s almost a no-op).

using BenchmarkTools: @btime

@btime adjacency_matrix(bigger_twitter)
println("why did that take so long?")
lg.adjacency_matrix(g::MatrixDiGraph) = Int.(g.m)
@btime A = lg.adjacency_matrix(bigger_twitter)
println("that's better.")

This should yield roughly:

13.077 ms (5222 allocations: 682.03 KiB)
why did that take so long?
82.077 μs (6 allocations: 201.77 KiB)
that's better.

You can fall down to a no-op by storing the matrix entries as Int directly,
but the type ends up being a bit heavier in memory, your type, your trade-off.

Conclusion

We’ve implemented a graph type suited to our need in a couple lines of Julia,
guided by the LightGraphs interface specifying how to think about our
graph instead of getting in the way of what to store. A lighter version
of this post can be read as slides.

As usual, ping me on Twitter for any
question or comment.

Bonus

If you read this and want to try building your own graph type, here are two
implementations you can try out, put them out in a public repo and show them off
afterwards:
1. We created a type just for directed graphs, why bothering so much? You can create your own type which can be directed or not,
either by storing the information in the struct or by parametrizing the type
and getting the compiler to do the work for you.
2. We store the entries as an AbstractMatrix{Bool}, if your graph is dense
enough (how dense? No idea), it might be interesting to store entries as as
BitArray.


Image source: GraphPlot.jl