Category Archives: Julia

Where for (loop) ARt Thou?

By: Jonathan Carroll

Re-posted from: https://jcarroll.com.au/2022/04/22/where-for-loop-art-thou/

I’ve long been interested in exactly how R works – not quite enough for me to learn all the
internals, but I was surprised that I could not find a clear guide towards exactly how vectorization works at the deepest level.

Let’s say we want to add two vectors which we’ve defined as x and y

x <- c(2, 4, 6)
y <- c(1, 3, 2)

One way to do this (the verbose, elementwise way) would be to add each pair of elements

c(x[1] + y[1], x[2] + y[2], x[3] + y[3])
## [1] 3 7 8

but if you are familiar with not repeating yourself, you might write this in a loop. Best practice
involves pre-filling the result to the correct size

res <- c(NA, NA, NA)
for (i in 1:3) {
  res[i] <- x[i] + y[i]
}
res
## [1] 3 7 8

There, we wrote a for loop.

Now, if you’ve ever seen R tutorials or discussions, you’ve probably had it drilled into you that
this is “bad” and should be replaced with something else. One of those options is an apply family
function

plus <- function(i) {
  x[i] + y[i]
}
sapply(1:3, plus)
## [1] 3 7 8

or something ‘tidier’

purrr::map_dbl(1:3, plus)
## [1] 3 7 8

(yes, yes, these functions are ‘bad’ because they don’t take x or y as arguments) but this stil
performs the operation elementwise. If you’ve seen even more R tutorais/discussions you’ve
probably been seen that vectorization is very handy – The R function + knows what to do with objects that
aren’t just a single value, and does what you might expect

x + y
## [1] 3 7 8

Now, if you’ve really read a lot about R, you’ll know that ‘under the hood’ a for-loop
is involved in every one of these, but it’s “lower down”, “at the C level”. Jenny Bryan makes the
point that “Of course someone has to write loops / It doesn’t have to be you” and for this reason,
vectorization in R is of great benefit.

So, there is a loop, but where exactly does that happen?

At some point, the computer needs to add the elements of x to the elements of y, and the simplest versions of this
happens one element at a time, in a loop. There’s a big sidetrack here about SIMD
which I’ll try to avoid, but I will mention that the Microsoft fork of R (artist, formerly known as Revolution R) running on Intel chips can do SIMD in MKL.

So, let’s start at the operator.

`+`
## function (e1, e2)  .Primitive("+")

Digging into primitives is a little tricky, but {pryr} can help

pryr::show_c_source(.Primitive("+"))
+ is implemented by do_arith with op = PLUSOP

We can browse a copy of the source for do_arith (in arithmetic.c) here where we see some
logic paths for scalars and vectors. Let’s assume we’re working with our example which has length(x) == length(y) > 1. With two non-scalar arguments

if !IS_SCALAR and argc == length(arg) == 2

This leads us to call R_binary

Depending on the class of the arguments, we need to call different functions, but for the sake of our example let’s say we have non-integer real numbers so we fork to real_binary. This takes a code argument for which type of operation we’re performing, and in our case it’s PLUSOP (noted above). There’s a case branch for this in which case, provided the arguments are of the same length (n1 == n2) we call

R_ITERATE_CHECK(NINTERRUPT, n, i, da[i] = dx[i] + dy[i];);

That’s starting to look a lot like a loop – there’s an iterator i and we’re going to call another function.

This jumps us over to a different file where we see LOOP_WITH_INTERRUPT_CHECK definitely performs some sort of loop. This takes the body above and the argument LOOP_ITERATE_CORE which is finally the actual loop!

#define R_ITERATE_CORE(n, i, loop_body) do {    \
   for (; i < n; ++i) { loop_body } \
} while (0)

so, that’s where the actual loop in a vectorized R call happens! ALL that sits behind the innocent-looking +.

That was thoroughly satisfying, but I did originally have in mind comparing R to another language – one where loops aren’t frowned upon because of performance, but rather encouraged… How do Julia loops differ?

Julia is not a vectorized language per se, but it has a neat ability to “vectorize” any operation, though in Julia syntax it’s “broadcasting”.

Simple addition can combine scalar values

3+4
## 7

Julia actually has scalar values (in R, even a single value is just a vector of length 1) so a single value could be

typeof(3)
## Int64

whereas several values need to be an Array, even if it only has 1 dimension

Vector{Int64}([1, 2, 3])
## 3-element Array{Int64,1}:
##  1
##  2
##  3

Trying to add two Arrays does work

[1, 2, 3] + [4, 5, 6]
## 3-element Array{Int64,1}:
##  5
##  7
##  9

but only because a specific method has been written for this case, i.e.

methods(+, (Array, Array))
## # 1 method for generic function "+":
## [1] +(A::Array, Bs::Array...) in Base at arraymath.jl:43

One thing I particularly like is that we can see exactly which method was called using the @which macro

@which [1, 2, 3, 4] + [1, 2, 3, 4]
+(A::Array, Bs::Array...) in Base at arraymath.jl:43

something that I really wish was easier to do in R. The @edit macro even jumps us right into the actual code for this dispatched call.

This ‘add vectors’ problem can be solved through broadcasting, which performs an operation elementwise

[1, 2, 3] .+ [4, 5, 6]
## 3-element Array{Int64,1}:
##  5
##  7
##  9

The fun fact about this I recently learned was that broadcasting works on any operation, even if that’s the pipe itself

["a", "list", "of", "strings"] .|> [uppercase, reverse, titlecase, length]
## 4-element Array{Any,1}:
##   "A"
##   "tsil"
##   "Of"
##  7

Back to our loops, the method for + on two Arrays points us to arraymath.jl (linked to current relevant line) which contains

function +(A::Array, Bs::Array...)
    for B in Bs
        promote_shape(A, B) # check size compatibility
    end
    broadcast_preserving_zero_d(+, A, Bs...)
end

The last part of that is the meaningful part, and that leads to Broadcast.broadcast_preserving_zero_d.

This starts to get out of my depth, but essentially

@inline function broadcast_preserving_zero_d(f, As...)
    bc = broadcasted(f, As...)
    r = materialize(bc)
    return length(axes(bc)) == 0 ? fill!(similar(bc, typeof(r)), r) : r
end

@inline broadcast(f, t::NTuple{N,Any}, ts::Vararg{NTuple{N,Any}}) where {N} = map(f, t, ts...)

involves a map operation to achieve the broadcasting.

Perhaps that’s a problem to tackle when I’m better at digging through Julia.

As always, comments, suggestions, and anything else welcome!

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.1.2 (2021-11-01)
##  os       Pop!_OS 21.04               
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language en_AU:en                    
##  collate  en_AU.UTF-8                 
##  ctype    en_AU.UTF-8                 
##  tz       Australia/Adelaide          
##  date     2022-04-22                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  blogdown      1.8     2022-02-16 [1] CRAN (R 4.1.2)
##  bookdown      0.24    2021-09-02 [1] CRAN (R 4.1.2)
##  brio          1.1.1   2021-01-20 [3] CRAN (R 4.0.3)
##  bslib         0.3.1   2021-10-06 [1] CRAN (R 4.1.2)
##  cachem        1.0.3   2021-02-04 [3] CRAN (R 4.0.3)
##  callr         3.7.0   2021-04-20 [1] CRAN (R 4.1.2)
##  cli           3.2.0   2022-02-14 [1] CRAN (R 4.1.2)
##  crayon        1.5.0   2022-02-14 [1] CRAN (R 4.1.2)
##  desc          1.4.1   2022-03-06 [1] CRAN (R 4.1.2)
##  devtools      2.4.3   2021-11-30 [1] CRAN (R 4.1.2)
##  digest        0.6.27  2020-10-24 [3] CRAN (R 4.0.3)
##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.2)
##  evaluate      0.14    2019-05-28 [3] CRAN (R 4.0.1)
##  fastmap       1.1.0   2021-01-25 [3] CRAN (R 4.0.3)
##  fs            1.5.0   2020-07-31 [3] CRAN (R 4.0.2)
##  glue          1.6.1   2022-01-22 [1] CRAN (R 4.1.2)
##  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.1.2)
##  jsonlite      1.7.2   2020-12-09 [3] CRAN (R 4.0.3)
##  JuliaCall   * 0.17.4  2021-05-16 [1] CRAN (R 4.1.2)
##  knitr         1.37    2021-12-16 [1] CRAN (R 4.1.2)
##  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.2)
##  magrittr      2.0.1   2020-11-17 [3] CRAN (R 4.0.3)
##  memoise       2.0.0   2021-01-26 [3] CRAN (R 4.0.3)
##  pkgbuild      1.2.0   2020-12-15 [3] CRAN (R 4.0.3)
##  pkgload       1.2.4   2021-11-30 [1] CRAN (R 4.1.2)
##  prettyunits   1.1.1   2020-01-24 [3] CRAN (R 4.0.1)
##  processx      3.5.2   2021-04-30 [1] CRAN (R 4.1.2)
##  ps            1.5.0   2020-12-05 [3] CRAN (R 4.0.3)
##  purrr         0.3.4   2020-04-17 [3] CRAN (R 4.0.1)
##  R6            2.5.0   2020-10-28 [3] CRAN (R 4.0.2)
##  Rcpp          1.0.6   2021-01-15 [3] CRAN (R 4.0.3)
##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.1.2)
##  rlang         1.0.1   2022-02-03 [1] CRAN (R 4.1.2)
##  rmarkdown     2.13    2022-03-10 [1] CRAN (R 4.1.2)
##  rprojroot     2.0.2   2020-11-15 [3] CRAN (R 4.0.3)
##  rstudioapi    0.13    2020-11-12 [3] CRAN (R 4.0.3)
##  sass          0.4.0   2021-05-12 [1] CRAN (R 4.1.2)
##  sessioninfo   1.1.1   2018-11-05 [3] CRAN (R 4.0.1)
##  stringi       1.5.3   2020-09-09 [3] CRAN (R 4.0.2)
##  stringr       1.4.0   2019-02-10 [3] CRAN (R 4.0.1)
##  testthat      3.1.2   2022-01-20 [1] CRAN (R 4.1.2)
##  usethis       2.1.5   2021-12-09 [1] CRAN (R 4.1.2)
##  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.2)
##  xfun          0.30    2022-03-02 [1] CRAN (R 4.1.2)
##  yaml          2.2.1   2020-02-01 [3] CRAN (R 4.0.1)
## 
## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library

SuiteSparseGraphBLAS.jl: An Introduction

By: Will Kimmerer

Re-posted from: https://wimmerer.github.io/blog/2022/03/SuiteSparseGraphBLAS_intro/index.html

SuiteSparseGraphBLAS.jl: An Introduction

This blog post serves a couple purposes. The first is to introduce Julia users to GraphBLAS, and the features and performance of the SuiteSparseGraphBLAS.jl package. The second is an update on new features in versions 0.6 and 0.7 as well as a roadmap for the near future. That's a lot for one blogpost so let's dive right in!

What is SuiteSparseGraphBLAS.jl

SuiteSparseGraphBLAS.jl is a sparse linear algebra library with some special features oriented to graph algorithms. It's an interface to the SuiteSparse implementation of the GraphBLAS standard, by Tim Davis. SuiteSparseGraphBLAS.jl serves two purposes: (1) it can replace SparseArrays.jl with faster parallel methods when using the plus-times semiring conventional linear algebra, and (2) it serves as a Julia interface to the full set of GraphBLAS features. GraphBLAS is a powerful tool for writing high-performance graph algorithms at a high level of abstraction, using linear algebraic operations on sparse adjacency matries with different semirings.

The headline feature is taking the matrix multiplication function, and turning it into a higher order function. Instead of the normal arithmetic operations + and * a user could substitute other binary operators, like max or &&.

When we multiply two matrices A and B as normal we use the following index expression:

\[ C_{ik} = \sum\limits_{j=1}^{n}{A_{ij} \times B_{jk}} \]

This expression uses the arithmetic semiring \((+, \times)\). A semiring is an abstract algebra structure consisting of two binary operators that interact in a particular way.

There are plenty of other interesting semirings to choose from, and no reason to limit ourselves to the arithmetic one! We could, for instance, use the max-plus semiring \((\max, +)\) (one of the tropical semirings). Then our index expression above looks like:

\[ C_{ik} = \max\limits_{j=1}^{n}{(A_{ij} + B_{jk})} \]

This semiring has many applications to dynamic programming and graphs, but to wrap up this short intro, GraphBLAS lets you perform this operation, particularly on sparse matrices:

\[ C_{ik} = \bigoplus\limits_{j=1}^{n}{(A_{ij} \otimes B_{jk})} \]

where \(\oplus\) is some binary operator (a monoid in particular) which takes the place of the \(\sum\) reduction, and \(\otimes\) is some binary operator which takes the place of \(\times\) in typical arithmetic.

Let's see some code!!

What does a simple operation look like? The equivalence between a matrix-vector multiplication and breadth-first search is a key component of linear algebraic graph algorithms.

 Adjacency matrix of a directed graph and a single iteration of BFS
Adjacency matrix of a directed graph and a single iteration of BFS
julia> using SuiteSparseGraphBLASjulia> A = GBMatrix([1,1,2,2,3,4,4,5,6,7,7,7], [2,4,5,7,6,1,3,6,3,3,4,5], [1:12...])
7x7 GraphBLAS int64_t matrix, bitmap by row
  12 entries, memory: 832 bytes    (1,2)   1
    (1,4)   2
    (2,5)   3
    (2,7)   4
    (3,6)   5
    (4,1)   6
    (4,3)   7
    (5,6)   8
    (6,3)   9
    (7,3)   10
    (7,4)   11
    (7,5)   12julia> v = GBVector([4], [10]; nrows = 7)
7x1 GraphBLAS int64_t matrix, bitmap by col
  1 entry, memory: 272 bytes
  iso value:   10    (4,1)   10julia> A * v
7x1 GraphBLAS int64_t matrix, bitmap by col
  2 entries, memory: 328 bytes    (1,1)   20
    (7,1)   110

This looks exactly like a matrix-vector multiplication with any other array type in Julia. With a different semiring, the result of A*v can return the parent node id of nodes 1 and 7, so that a BFS tree can be constructed.

Where possible, SuiteSparseGraphBLAS.jl will behave exactly as any other array type in Julia. But there are some places where SuiteSparseGraphBLAS.jl has significant extra functionality. I'll illustrate that here by finding the number of triangles in the graph above.

For those that haven't seen triangle counting before, a triangle in a graph is a set of three vertices whose edges form a triangle. We'll pretend like the graph is undirected, and then you can see two triangles: \((2, 5, 7)\) and \((3, 4, 7)\).

julia> using SuiteSparseGraphBLAS: pairjulia> function cohen(A)
         U = triu(A)
         L = tril(A)
         return reduce(+, *(L, U, (+, pair); mask=A)) ÷ 2
       end
cohen (generic function with 1 method)julia> function sandia(A)
         L = tril(A)
         return reduce(+, *(L, L, (+, pair); mask=L))
       end
sandia (generic function with 1 method)julia> M = eadd(A, A', +) #Make undirected/symmetric
7x7 GraphBLAS int64_t matrix, bitmap by row
  20 entries, memory: 832 bytes    (1,2)   1
    (1,4)   8
    (2,1)   1
    (2,5)   3
    (2,7)   4
    (3,4)   7
    (3,6)   14
    (3,7)   10
    (4,1)   8
    (4,3)   7
    (4,7)   11
    (5,2)   3
    (5,6)   8
    (5,7)   12
    (6,3)   14
    (6,5)   8
    (7,2)   4
    (7,3)   10
    (7,4)   11
    (7,5)   12julia> cohen(M)
2julia> sandia(M)
2

There are a couple unique features of SuiteSparseGraphBLAS.jl used in the two methods, cohen and sandia above.

The first is the pair function. This function returns 1 whenever both arguments x and y are explicit stored values, otherwise it returns nothing (an implicit zero). To illustrate:

julia> using SuiteSparseGraphBLASjulia> u = GBVector([2,4,5], [4,5,6])
5x1 GraphBLAS int64_t matrix, bitmap by col
  3 entries, memory: 328 bytes    (2,1)   4
    (4,1)   5
    (5,1)   6julia> v = GBVector([1,3,5], [1,2,3])
5x1 GraphBLAS int64_t matrix, bitmap by col
  3 entries, memory: 328 bytes    (1,1)   1
    (3,1)   2
    (5,1)   3julia> SuiteSparseGraphBLAS.pair.(u, v)
5x1 GraphBLAS int64_t matrix, bitmap by col
  1 entry, memory: 272 bytes
  iso value:   1    (5,1)   1

This function is primarily a performance enhancement. In an algorithm like triangle counting we don't care about the weight of a particular edge, just that it exists. pair lets us avoid a costly multiplication by just checking that x is a stored value in uand y is a stored value in v.

The second feature is the semirings discussed in What is SuiteSparseGraphBLAS.jl above. The tuple (+, pair) in *(L, U, (+, pair); mask=A) indicates the the * function is using the +pair semiring:

\[ C_{ik} = \sum\limits_{j=1}^{n}{\text{pair}(A_{ij}, B_{jk})} \]

Finally, *(L, U, (+, pair); mask=A) uses A as a mask. The mask prevents values from being placed in the result where the mask is false (or true if complemented). This is a powerful algorithmic and performance tool.

Summary of Primary Functions

Most of the SuiteSparse:GraphBLAS interface is accessible through the expected Julia functions, alongside some new functions like emul or subassign!.

The complete documentation of supported operations can be found in Operations.

GraphBLAS Operation Julia
mxm, mxv, vxm \(\bf C \langle M \rangle = C \odot AB\) mul! or *
eWiseMult \(\bf C \langle M \rangle = C \odot (A \otimes B)\) emul[!] or . broadcasting
eWiseAdd \(\bf C \langle M \rangle = C \odot (A \oplus B)\) eadd[!]
extract \(\bf C \langle M \rangle = C \odot A(I,J)\) extract[!], getindex
subassign \(\bf C (I,J) \langle M \rangle = C(I,J) \odot A\) subassign[!] or setindex!
assign \(\bf C \langle M \rangle (I,J) = C(I,J) \odot A\) assign[!]
apply \({\bf C \langle M \rangle = C \odot} f{\bf (A)}\) apply[!], map[!] or . broadcasting
\({\bf C \langle M \rangle = C \odot} f({\bf A},y)\)
\({\bf C \langle M \rangle = C \odot} f(x,{\bf A})\)
select \({\bf C \langle M \rangle = C \odot} f({\bf A},k)\) select[!]
reduce \({\bf w \langle m \rangle = w \odot} [{\oplus}_j {\bf A}(:,j)]\) reduce[!]
\(s = s \odot [{\oplus}_{ij} {\bf A}(i,j)]\)
transpose \(\bf C \langle M \rangle = C \odot A^{\sf T}\) gbtranspose[!], lazy: transpose, '
kronecker \(\bf C \langle M \rangle = C \odot \text{kron}(A, B)\) kron[!]

where \(\bf M\) is a GBArray mask, \(\odot\) is a binary operator for accumulating into \(\bf C\), and \(\otimes\) and \(\oplus\) are a binary operation and commutative monoid respectively. \(f\) is either a unary or binary operator.

Show Me the Numbers!

SuiteSparseGraphBLAS.jl has loads of extensions to normal sparse linear algebra, but it's also fast and multithreaded. Let's look at some numbers!

As always, benchmark things yourself. Most operations will be faster in SuiteSparseGraphBLAS.jl, particularly when the matrices are large enough that multithreading kicks in.

However, maintaining good performance can be tricky in any numerical package, and there's plenty of ways to accidentally reduce performance. For instance, below you'll notice that when A is stored in RowMajor format it can be quite a bit faster than operations where A is stored in ColMajor format. This isn't always the case, some operations favor column orientation.

Always feel free to ask for performance tips in the #graphblas Julia Slack channel or open an issue on GitHub. And check out the SuiteSparse:GraphBLAS User Guide, especially the section on performance. Also see a recent submission to the ACM Transactions on Mathemetical Software (under revision) with more performance results.

The plots below show the runtime of several operations in SuiteSparseGraphBLAS.jl normalized to the runtime of SparseArrays.SparseMatrixCSC (orange bars) performing the same operations. SuiteSparseGraphBLAS.jl is shown in both column (blue bars) and row (red bars) orientation where necessary, and with 1, 2, and 16 threads.

Matrix multiplication results are discussed briefly after the 3rd plot, all matrices are drawn from the SuiteSparse Matrix Collection unless randomly generated.

Sparse Matrix \(\cdot\) Dense Vector

Sparse Matrix \(\cdot\) (n \(\times\) 2) Dense Matrix

Sparse Matrix \(\cdot\) (n \(\times\) 32) Dense Matrix

When the dense matrix is low-dimensional and only a single thread is used, Julia compares very favorably with, and often beats, SuiteSparseGraphBLAS.jl. However, when 2 threads are available and the sparse matrix is stored in row-major orientation SuiteSparseGraphBLAS.jl begins to pull ahead significantly. Once a full 16 threads are used SuiteSparseGraphBLAS.jl on multiplication of row-major matrices is between 8 and 31 times faster.

SuiteSparse:GraphBLAS uses well over a dozen subalgorithms for matrix multiplication internally to achieve this performance. In particular when A is a sparse row-oriented matrix, and B is a dense column oriented matrix SuiteSparse:GraphBLAS will switch to a highly optimized dot-product algorithm, which is often much faster than the saxpy based algorithm used when A is column oriented. This is noticeable in cases above where the row-major runtimes can be > 2x faster than the column-major ones.

Transpose

SparseArrays.jl maintains performance parity for most problems when SuiteSparseGraphBLAS.jl is restricted to a single thread, but can be as much as 10x slower when 16 threads are used. This test takes advantage of some special SuiteSparseGraphBLAS.jl features like iso-valued matrices, but we restrict some significant optimizations like quick transposition (reinterpreting a by-column matrix as by-row) to remain fair.

Submatrix Assignment

 Note the log scale
Note the log scale
Benchmark Group Size (Density) of C Size (Density) of A
A \(10,000^{2} \quad (0.001)\) \(2000^{2} \quad (0.1)\)
B \(1,000,000^{2} \quad (0.0005)\) \(5000^{2} \quad (0.005)\)
C \(25,000,000^{2} \quad (1 \times 10^{-7})\) \(5000^{2} \quad (0.002)\)
D \(50,000,000^{2} \quad (1 \times 10^{-7})\) \(100,000^{2} \quad (0.001)\)
E \(50,000,000^{2} \quad (1 \times 10^{-7})\) \(1000^{2} \quad (1.0)\)

The expression C[I,J] = A assigns a matrix A into a submatrix of C, a difficult function to optimize. For a large sparse random matrix C (25 million by 25 million with 62 million entries), and with I and J selected at random, and A of size 5000-by-5000 with 50,000 entries, Julia takes 0.87 seconds using SparseArrays.jl, while using SuiteSparseGraphBLAS.jl with a single thread the time is just 0.04 seconds.

The runtime for a GBMatrix by-row or by-column is equivalent, so only by-column is shown here.

A New Version of SuiteSparseGraphBLAS.jl

Versions 0.6 and 0.7 of SuiteSparseGraphBLAS.jl brought with them several new features and many under the hood changes. The changes under the hood will support faster development in the future and otherwise cleaned up the codebase. The type system was completely overhauled to enable new matrix types with special extensions, the low level wrapper was scrubbed of any human-written contamination, and the operator/user-defined-function system was rewritten.

New Features

User Defined Fill Value

The default GBMatrix returns nothing when indexing into an "implicit-zero" or non-stored value. This better matches the GraphBLAS method of attaching the implicit values to operators like max and *. It also is more natural for graphs, where there is a semantic difference between a stored zero (an edge with weight zero), and an implicit zero (no edge).

Users may now directly set the fill value best suited to their application. This can be done in 3 ways:

  1. On construction: a new keyword argument for constructors, fill.

  2. setfill!: a new mutating function which can be used to change the fill value. This function may only change the fill value to another value within the same domain as the original fill value to maintain type stability.

  3. setfill: a new non-mutating function which returns a new GBMatrix which aliases the original, but with a new fill value. This new value may be of any type.

These changes allow the GBMatrix to be seamlessly used for general scientific applications that expect implicit zeros to be zero, while still adhering to the original design intended for graph algorithms.

mmread Function

Previously reading Matrix Market files was supported only by first reading into a SparseArrays.SparseMatrixCSC before converting, which was slow and memory intensive. We now have a native SuiteSparseGraphBLAS.mmread function. It remains unexported to avoid clashes with other packages. In the future it will likely be superseded (but not replaced) by some function which can automatically detect and read from a number of formats.

Serialization

Using the Serialization Julia standard library users can now easily serialize and deserialize a GraphBLAS data structures. The serialized array is compressed using LZ4 making both read and write operations much faster than operating on a Matrix Market file.

StorageOrders

A new dependency on StorageOrders.jl makes it easier to parametrize new AbstractGBArray types, for instance by restricting the orientation to StorageOrders.RowMajor() or StorageOrders.ColMajor(). The experimental OrientedGBMatrix and GBMatrixC/GBMatrixR subtypes take advantage of this.

Users may now also call gbset(A, :format, RowMajor()), instead of gbset(A, :format, :byrow) avoiding the use of magic symbols. This interface will continue to improve in the future.

apply and Deprecation of Scalar Argument map

The new function apply[!] is now the direct wrapper of the GrB_apply function. map still functions as expected, but no longer accepts a scalar argument. Previously, when map was the direct wrapper of GrB_apply, map(+, A, 3) was legal and equivalent to A .+ 3. This caused many (mostly obscure) ambiguities between map(op::Function, A::GBArray, x), map(op::Function, x, A::GBArray) and several external methods.

apply now performs this function, although the recommended method remains dot-broadcasting.

wait Function Fully Implemented

To fully support calling SuiteSparseGraphBLAS.jl from multiple user threads the wait function has been fully implemented, which allows users to use a matrix from multiple threads simultaneously.

Experimental Functionality

as Function

The new as functions grew out of an internal need to safely and quickly view a GBMatrix as a DenseMatrix/SparseMatrixCSC or vice-versa.

julia> A = GBMatrix([[1, 2] [3,4]])
2x2 GraphBLAS int64_t matrix, full by col
  4 entries, memory: 288 bytes    (1,1)   1
    (2,1)   2
    (1,2)   3
    (2,2)   4julia> SuiteSparseGraphBLAS.as(Matrix, A) do mat
           display(mat)
           mat .+= 1
           sum(mat)
       end
2×2 Matrix{Int64}:
 1  3
 2  4
14julia> A
2x2 GraphBLAS int64_t matrix, full by col
  4 entries, memory: 288 bytes    (1,1)   2
    (2,1)   3
    (1,2)   4
    (2,2)   5

Note that this functionality is currently somewhat dangerous. If mat escapes the scope of as in some way, for instance by returning the Transpose of mat, the underlying memory may be freed by SuiteSparseGraphBLAS.jl. If the user attempts to return mat directly the as function will gracefully copy the matrix rather than return an array that may be invalidated in the future.

Automatic Differentiation

Automatic Differentiation support continues to improve, with additional constructor rules, and rules for the second and first families of semirings. The biggest remaining missing pieces are the tropical semirings, user-defined functions and more rigorous testing of user use-cases.

Roadmap

There's lots of upcoming features, extensions, and JuliaLab Compiler Trickery™ on the horizon but here's a few important updates to look out for:

SuiteSparse Support

Supporting the SuiteSparse solvers is 1st on the list, and support will be released sometime in April 2022. This is a relatively simple addition, but involves quite a bit of duplicated code from SuiteSparse.jl, which assumes the use of SparseMatrixCSC.

GBPirate.jl Full Release

GBPirate.jl allows users of SparseArrays.jl to pirate certain methods of that package to use the more performant ones found in SuiteSparseGraphBLAS.jl. Because this involves copying on output it is not always faster, and the remaining development involves finding a heuristic for when this is beneficial.

User Defined Types

User defined types are not currently part of the public interface of SuiteSparseGraphBLAS.jl while the interface is improved. This feature will enable users to use any statically sized isbits type, like dual-numbers, colors, as elements of a GBMatrix. Release is planned for late April 2022.

GBGraphs.jl

A new Graphs.jl backend built on SuiteSparseGraphBLAS.jl will allow users access to the full Graphs.jl ecosystem, while letting them develop fast algorithms in the language of linear algebra!

Conclusion

If you've got any need for sparse matrix operations give SuiteSparseGraphBLAS.jl a try, and feel free to open an issue or contact me directly if you run into problems or have a feature request!

– Will Kimmerer

Acknowledgements

Thanks to Tim Davis for his help on the design of SuiteSparseGraphBLAS.jl and for the underlying C library SuiteSparse:GraphBLAS.

SuiteSparseGraphBLAS.jl was created originally by Abhinav Mehndiratta. Recent versions were developed by Will Kimmerer under the mentorship of Viral Shah and Miha Zgubic.

Some special cases of method dispatch in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/04/15/dispatch.html

Introduction

As I have recently written in this post the
Julia for Data Analysis
book that I am preparing is already available in MEAP preview.
Manning is releasing its chapters sequentially (approximately one chapter per
several weeks). I thought that when some chapter gets released in MEAP I will
write on the blog about related things that are not included in the book.

This week chapter 3 was released. It includes discussion of multiple dispatch
in Julia. Therefore, in this post I thought that I will comment on some specific
cases of method dispatch that I found important to understand.

This post was tested under Julia 1.7.2.

Dispatch with Unions

In the Julia Manual you can read in the Methods section that:

When a function is applied to a particular tuple of arguments,
the most specific method applicable to those arguments is applied.

It is important to remember that this rule also applies to Unions. Why is this
relevant?

Consider that you want to add a method to a function that already has the
following methods defined:

fun(x, y) = "slow"
fun(x::Int, y::Number) = "faster"

Your additional method is very fast, but expects y to be Int and x to be
either Int or Float64. The first approach would be to add the following
definition:

fun(x::Union{Int, Float64}, y::Int) = "fastest"

However, it will lead to dispatch ambiguity:

julia> fun(1, 1)
ERROR: MethodError: fun(::Int64, ::Int64) is ambiguous. Candidates:
  fun(x::Int64, y::Number) in Main at REPL[2]:1
  fun(x::Union{Float64, Int64}, y::Int64) in Main at REPL[3]:1
Possible fix, define
  fun(::Int64, ::Int64)

The reason is that we have two conflicting methods that could be applied
to this set of arguments and neither of them is more specific. This is an
expected behavior from a dispatch perspective. However, it means
that you need to duplicate code and instead of
fun(x::Union{Int, Float64}, y::Int) = "fastest"
write two definitions:

fun(x::Float64, y::Int) = "fastest"
fun(x::Int, y::Int) = "fastest"

and now all will work as expected. The point is that using Union is not the
same as writing several separate definitions “unrolling” the Union from a
dispatch perspective.

However, there is one slight problem. In our case the definition of fun was
quite short, but what if it were 100 LOC? Doing copy-paste would lead to a lot
of code duplication (which is not recommended). One of the solutions in such
cases is to use the @eval macro and do the required definitions like this:

for T in (Int, Float64)
    @eval fun(x::$T, y::Int) = "fastest"
end

Dispatch with keyword arguments

In the Julia Manual you can read in the Note on Optional and keyword Arguments section that:

Keyword arguments behave quite differently from ordinary positional arguments.
In particular, they do not participate in method dispatch. Methods are
dispatched based only on positional arguments, with keyword arguments
processed after the matching method is identified.

The reality is that in some cases keyword arguments are taken into account
when deciding about dispatch. This situation happens if you have some methods
that define keyword arguments and some other methods that do not define them.

Again, let me illustrate it with an example. I define the following methods:

julia> g(x::Int) = "int"
g (generic function with 1 method)

julia> g(x; kwarg) = "any"
g (generic function with 2 methods)

julia> g(1)
"int"

julia> g(1; kwarg=nothing)
"any"

In this case the fact that one method had no keyword arguments and
the other had a keyword argument influenced dispatch.

However, consider the following scenario, which is similar:

julia> h(x::Int; kwarg2) = "int"
h (generic function with 3 methods)

julia> h(x; kwarg) = "any"
h (generic function with 4 methods)

julia> h(1; kwarg=nothing)
ERROR: UndefKeywordError: keyword argument kwarg2 not assigned

This time, since the first method defined some keyword argument dispatch
selected it, and next produced an error since the name of the keyword argument
was incorrect.

Let me give you one more example that is related to methods design with
keyword arguments. It happens in practice so it is worth remembering.

julia> f(args...) = "positional"
f (generic function with 1 method)

julia> f(; kwargs...) = "keyword"
f (generic function with 2 methods)

julia> methods(f)
# 2 methods for generic function "f":
[1] f(; kwargs...) in Main at REPL[2]:1
[2] f(args...) in Main at REPL[1]:1

As you can see these definitions indeed result in two separate methods for f.
Now, a question to you is what will happen if you write f()?

As you might have predicted the keyword argument method is invoked because
it is more specific in terms of positional arguments:

julia> f()
"keyword"

Conclusions

The examples I have chosen are not artificial. All of them were relevant
(and lead to bugs in development) when I was working on DataFrames.jl.

I hope you will find them useful. I omit them in my book as I feel
they would distract readers’ attention, but once one goes deeper into Julia
they can be encountered in practice.