Super fast pattern search: The FFT trick

Re-posted from: https://juliohm.github.io/science/fft-trick/

In today’s post, I would like to document an old trick in image processing sometimes referred to as “The FFT trick” for exhaustive pattern search.

Although I would like to give credit to the person who introduced the trick in the literature, I don’t have this information with me. Please leave a comment if you have references.

Learning with a sliding window

Filtering an image with a sliding window is a routine operation in various disciplines like image processing, computer vision, geostatistics, and deep learning (e.g. convolutional neural networks), to name a few.

Depending on the application, this operation serves different purposes. It is used everywhere as illustrated in the table:

Field of study Typical window Purposes
Image processing
• Sobel kernel: $$\small\begin{bmatrix}1 & 0 & -1 \\ 2 & 0 & -2\\ 1 & 0 & -1\end{bmatrix}$$
• Gaussian blur: $$\small\frac{1}{16}\begin{bmatrix}1 & 2 & 1 \\ 2 & 4 & 2\\ 1 & 2 & 1\end{bmatrix}$$
• Edge detection
• Blurring
• Denoising
Computer vision Same as in Image processing
• Image descriptors (e.g. GIST, HoG)
• Invariance and alignment
Geostatistics A spatial pattern: $%
• Pattern search
Deep learning Random weights: $%
• Convolutional neural networks

BRAINSTORM: Why sliding windows? Couldn’t we learn more by using a different technique? What does a sliding window misses in terms of learning? :thinking:

Given its importance, a great deal of time was devoted by the scientific community to optimizing software and hardware for filtering images (or arrays) with sliding windows.

This post is about a clever trick that exploits legacy high-performance filtering algorithms to speed up pattern search on very large images.

FFT filtering

Consider a first naive implementation of the filtering operation in Julia that loops over the input image, extracts a subimage, multiplies it with the window entrywise, and adds the results:

function naivefilter(image::AbstractArray, window::AbstractArray)
nx, ny = size(image)
wx, wy = size(window)

[sum(window .* image[i:i+wx-1,j:j+wy-1]) for i=1:nx-wx+1, j=1:ny-wy+1]
end


It works well, but is relatively slow for industry standards. Compare it to the efficient implementation available in Images.jl:

using Images
using BenchmarkTools

image = zeros(100,100)
window = zeros(3,3)

@benchmark naivefilter(image, window)
@benchmark imfilter(image, window)

# BenchmarkTools.Trial: (naivefilter)
#   memory estimate:  7.99 MiB
#   allocs estimate:  192093
#   --------------
#   minimum time:     6.512 ms (0.00% GC)
#   median time:      6.638 ms (0.00% GC)
#   mean time:        7.209 ms (7.67% GC)
#   maximum time:     10.569 ms (15.77% GC)
#   --------------
#   samples:          693
#   evals/sample:     1
#   time tolerance:   5.00%
#   memory tolerance: 1.00%
#
# BenchmarkTools.Trial: (imfilter)
#   memory estimate:  204.03 KiB
#   allocs estimate:  268
#   --------------
#   minimum time:     378.637 μs (0.00% GC)
#   median time:      388.839 μs (0.00% GC)
#   mean time:        406.210 μs (2.51% GC)
#   maximum time:     2.035 ms (78.63% GC)
#   --------------
#   samples:          10000
#   evals/sample:     1
#   time tolerance:   5.00%
#   memory tolerance: 1.00%


From the output, we notice $ms$ turning into $\mu$s and MiB turning into KiB. Besides the cache-friendly code that Tim Holy and others have written, I would like to comment on another optimization that Images.jl does that is not available in other image processing packages and programming languages.

Image filtering can alternatively be implemented with Fast Fourier Transforms (FFTs). The details of the algorithm and the theory behind it are beyond the scope of this post. The message that I want to convey is that depending on the window size, particularly for large windows, the FFT algorithm is much faster.

In Images.jl you can call the FFT algorithm explicitly by using:

imfilter(image, window, Algorithm.FFT())


but you don’t need to, Images.jl will call the faster algorithm depending on the window size you pass to it! :raised_hands:

The switch point between algorithms was set by means of benchmarks on a few modern processors. Feel free to submit a pull request if you feel that the switch point isn’t quite right for your hardware.

Now that we’ve seen that image filtering is a super efficient operation (specially in Julia), let’s exploit it to find patterns in images.

Given an image $A$ and a spatial pattern $B$, an exhaustive pattern search consists of computing a distance (or dissimilarity) between $B$ and all subimages of $A$. The subimages for which the distance is smallest are called matches. If the computed distance is zero, the match is called a perfect match.

Now is the trick. Consider the Euclidean distance and relax the notation $D = \left(A - B\right)^2$ to mean

Loop over all subimages $A_{sub}$ of $A$ and compute $D_{sub} = \left(A_{sub} - B\right)^2$

where $A_{sub}$ and $B$ have the same size. It turns out that we can implement this operation by rewriting

and replacing the original inefficient loop with two FFT filtering + one reduction:

A² = imfilter(A.^2, ones(B))
AB = imfilter(A, B)
B² = sumabs2(B)

D = abs(A² - 2AB + B²) # here "abs" takes care of floating point errors


Isn’t that awesome? This trick was crucial in my ImageQuilting.jl package, it allowed me to process large 3D images in minutes instead of hours.

Finally, I would like to add that high-performance GPU implementations of FFT filtering are available in Julia. You can set an option in your package to use CLFFT.jl and have the fastest software in the market. :blush: