Author Archives: Júlio Hoffimann

Where are the Julians?

By: Júlio Hoffimann

Re-posted from: https://juliohm.github.io/dataviz/where-are-the-julians/



Instructions

  • Pan and zoom with the mouse.
  • Click on a bubble to open profiles on GitHub.
  • Alt + click to remove a bubble.

You may need to unblock popups in your browser to have multiple profiles opening as tabs. Removing a bubble can be useful for revealing other bubbles.

Julians are presented in decreasing order of contributions. An arc is drawn between locations X and Y in the map whenever a Julian in X and a Julian in Y have contributed to a common package.

Want to be on the map?

If your nickname is listed below and you want to appear on the map, please consider typing your address on GitHub:

Data

The data was extracted from METADATA. It only includes members
of the community that have contributed to a registered Julia package (e.g. issues, pull requests) up
until 27-June-2017.

The Jupyter notebook used for data extraction is available in our
JuliaGraphsTutorials repository.

Say hello to a Julian near you. #JuliansInTheGlobe

The DAG of Julia packages

By: Júlio Hoffimann

Re-posted from: https://juliohm.github.io/dataviz/DAG-of-Julia-packages/

If your package is listed below, please consider fixing it:


Instructions

This interactive visualization made with D3 shows the Directed Acyclic Graph (DAG) of all registered Julia packages up until 27-June-2017.

The size of a node represents its influence (i.e. out degree) in the DAG. The color represents the required Julia version.

Hover the mouse over the elements to get more information.

Data

The data was extracted from METADATA with the following script:

UPDATE: An updated version of the script can be found in the JuliaGraphsTutorials repository in the form of a Jupyter notebook. Additional information about the Julia package system is extracted therein with the GitHub.jl package.

using JSON
using LightGraphs
using ProgressMeter

# find all packages in METADATA
pkgs = readdir(Pkg.dir("METADATA"))
filterfunc = p -> isdir(joinpath(Pkg.dir("METADATA"), p)) && p  [".git",".test"]
pkgs = filter(filterfunc, pkgs)

# assign each package an id
pkgdict = Dict{String,Int}()
for (i,pkg) in enumerate(pkgs)
  push!(pkgdict, pkg => i)
end

# build DAG
G = DiGraph(length(pkgs))
@showprogress 1 "Building graph..." for pkg in pkgs
  children = Pkg.dependents(pkg)
  for c in children
    add_edge!(G, pkgdict[pkg], pkgdict[c])
  end
end

# find required Julia version
juliaversions = String[]
for pkg in pkgs
  versiondir = joinpath(Pkg.dir("METADATA"), pkg, "versions")
  if isdir(versiondir)
    latestversion = readdir(versiondir)[end]
    reqfile = joinpath(versiondir, latestversion, "requires")
    juliaversion = string(get(Pkg.Reqs.parse(reqfile), "julia", "NA"))
    push!(juliaversions, juliaversion)
  else
    push!(juliaversions, "BOGUS")
  end
end

# construct JSON
nodes = [Dict("id"=>pkgs[v],
              "indegree"=>indegree(G,v),
              "outdegree"=>outdegree(G,v),
              "juliaversion"=>juliaversions[v]) for v in vertices(G)]
links = [Dict("source"=>pkgs[u], "target"=>pkgs[v]) for (u,v) in edges(G)]
data = Dict("nodes"=>nodes, "links"=>links)

# write to file
open("DAG-Julia-Pkgs.json", "w") do f
  JSON.print(f, data, 2)
end

Super fast pattern search: The FFT trick

By: Júlio Hoffimann

Re-posted from: https://juliohm.github.io/science/fft-trick/

In today’s post, I would like to document an old trick in image processing sometimes referred to as “The FFT trick” for exhaustive pattern search.

Although I would like to give credit to the person who introduced the trick in the literature, I don’t have this information with me. Please leave a comment if you have references.

Learning with a sliding window

Filtering an image with a sliding window is a routine operation in various disciplines like image processing, computer vision, geostatistics, and deep learning (e.g. convolutional neural networks), to name a few.

Sliding window on my breakfast.

Depending on the application, this operation serves different purposes. It is used everywhere as illustrated in the table:

Field of study Typical window Purposes
Image processing
  • Sobel kernel: $$\small\begin{bmatrix}1 & 0 & -1 \\ 2 & 0 & -2\\ 1 & 0 & -1\end{bmatrix}$$
  • Gaussian blur: $$\small\frac{1}{16}\begin{bmatrix}1 & 2 & 1 \\ 2 & 4 & 2\\ 1 & 2 & 1\end{bmatrix}$$
  • Edge detection
  • Blurring
  • Denoising
Computer vision Same as in Image processing
  • Image descriptors (e.g. GIST, HoG)
  • Invariance and alignment
Geostatistics A spatial pattern:
  • Pattern search
Deep learning Random weights:
  • Convolutional neural networks

BRAINSTORM: Why sliding windows? Couldn’t we learn more by using a different technique? What does a sliding window misses in terms of learning? :thinking:

Given its importance, a great deal of time was devoted by the scientific community to optimizing software and hardware for filtering images (or arrays) with sliding windows.

This post is about a clever trick that exploits legacy high-performance filtering algorithms to speed up pattern search on very large images.

FFT filtering

Consider a first naive implementation of the filtering operation in Julia that loops over the input image, extracts a subimage, multiplies it with the window entrywise, and adds the results:

function naivefilter(image::AbstractArray, window::AbstractArray)
  nx, ny = size(image)
  wx, wy = size(window)

  [sum(window .* image[i:i+wx-1,j:j+wy-1]) for i=1:nx-wx+1, j=1:ny-wy+1]
end

It works well, but is relatively slow for industry standards. Compare it to the efficient implementation available in Images.jl:

using Images
using BenchmarkTools

image = zeros(100,100)
window = zeros(3,3)

@benchmark naivefilter(image, window)
@benchmark imfilter(image, window)

# BenchmarkTools.Trial: (naivefilter)
#   memory estimate:  7.99 MiB
#   allocs estimate:  192093
#   --------------
#   minimum time:     6.512 ms (0.00% GC)
#   median time:      6.638 ms (0.00% GC)
#   mean time:        7.209 ms (7.67% GC)
#   maximum time:     10.569 ms (15.77% GC)
#   --------------
#   samples:          693
#   evals/sample:     1
#   time tolerance:   5.00%
#   memory tolerance: 1.00%
#
# BenchmarkTools.Trial: (imfilter)
#   memory estimate:  204.03 KiB
#   allocs estimate:  268
#   --------------
#   minimum time:     378.637 μs (0.00% GC)
#   median time:      388.839 μs (0.00% GC)
#   mean time:        406.210 μs (2.51% GC)
#   maximum time:     2.035 ms (78.63% GC)
#   --------------
#   samples:          10000
#   evals/sample:     1
#   time tolerance:   5.00%
#   memory tolerance: 1.00%

From the output, we notice turning into s and MiB turning into KiB. Besides the cache-friendly code that Tim Holy and others have written, I would like to comment on another optimization that Images.jl does that is not available in other image processing packages and programming languages.

Image filtering can alternatively be implemented with Fast Fourier Transforms (FFTs). The details of the algorithm and the theory behind it are beyond the scope of this post. The message that I want to convey is that depending on the window size, particularly for large windows, the FFT algorithm is much faster.

In Images.jl you can call the FFT algorithm explicitly by using:

imfilter(image, window, Algorithm.FFT())

but you don’t need to, Images.jl will call the faster algorithm depending on the window size you pass to it! :raised_hands:

The switch point between algorithms was set by means of benchmarks on a few modern processors. Feel free to submit a pull request if you feel that the switch point isn’t quite right for your hardware.

Now that we’ve seen that image filtering is a super efficient operation (specially in Julia), let’s exploit it to find patterns in images.

Given an image and a spatial pattern , an exhaustive pattern search consists of computing a distance (or dissimilarity) between and all subimages of . The subimages for which the distance is smallest are called matches. If the computed distance is zero, the match is called a perfect match.

Now is the trick. Consider the Euclidean distance and relax the notation to mean

Loop over all subimages of and compute

where and have the same size. It turns out that we can implement this operation by rewriting

and replacing the original inefficient loop with two FFT filtering + one reduction:

 = imfilter(A.^2, ones(B))
AB = imfilter(A, B)
 = sumabs2(B)

D = abs( - 2AB + ) # here "abs" takes care of floating point errors

Isn’t that awesome? This trick was crucial in my ImageQuilting.jl package, it allowed me to process large 3D images in minutes instead of hours.

Finally, I would like to add that high-performance GPU implementations of FFT filtering are available in Julia. You can set an option in your package to use CLFFT.jl and have the fastest software in the market. :blush: