Category Archives: Julia

Hunting for bugs in Julia for Data Analysis

Re-posted from: https://bkamins.github.io/julialang/2023/03/03/errata.html

Introduction

A few months ago my Julia for Data Analysis book was released.
I tried very hard to make it correct. Regrettably, I failed.

Fortunately my readers are more careful than me and they found several issues
in my book. I keep their log in the Errata section of the GitHub
repository of the book.

In this post I want to share you my experience about the kinds of bugs
slipped into the book and comment how I think they could be avoided.

My experience is that there are three important classes of issues:

various misprints and typographical errors;
consistency of code execution across Julia versions;
factual errors.

Let me go through these classes in sequence.

Misprints and typographical errors

Such errors sneak in for many reasons. Let me highlight some that have bitten
me and could be avoided.

First is printing of non-standard UTF-8 characters. These are nasty. In my
version of the book things look nice, but when it went through printing
preparation process, somehow on the way some characters got messed up.
For example in Chapter 6, section 6.4.1, page 132
codeunits("ε") is printed as codeunits("?").

Takeaway: before your book goes to press always carefully check these parts of
the text where you use non-standard UTF-8 characters (a common case in Julia).

Second is the side effect of using multi-selection or auto-replace in your
editor. For example in Chapter 3, section 3.2.3, page 59 I have
an issue that I write sort(v::AbstractVector; kwthe.) instead of
sort(v::AbstractVector; kws...). It is clear that I must have used some
pattern matching and replaced s.. with the. I do not even remember why.

Takeaway: multi-selection and auto-replace are nice, but never do them
globally; it is best to review every change before it is applied (do not make
them automatically in one shot; it is hart to resist, but this is what you
really should not do).

Consistency across versions of Julia

There are many flavors of this issue. It can mostly be resolved by strict use
of Project.toml and Manifest.toml files, but sometimes unexpected things happen.

For example in Chapter 1, section 1.2.1, page 7 I show the following snippet:

julia> function sum_n(n)
           s = 0
           for i in 1:n
               s += i
           end
           return s
       end
sum_n (generic function with 1 method)

julia> @time sum_n(1_000_000_000)
  0.000001 seconds
500000000500000000

This timing is surprisingly fast (and the reason is explained in the book).
The issue is that this is the situation under Julia 1.7 as this is the version
of the language I used in the book.

Under Julia 1.8 and Julia 1.9 running the same code takes longer
(tested under Julia 1.9-beta4):

julia> @time sum_n(1_000_000_000)
  2.265569 seconds
500000000500000000

The reason for this inconsistency is a bug in the @time macro introduced in
Julia 1.8 release. The sum_n(1_000_000_000) call (without @time) is executed
fast. Here is a simplified benchmark (run under Julia 1.9-beta4) showing this:

julia> let
           start = time_ns()
           v = sum_n(1_000_000_000)
           stop=time_ns()
           v, Int(stop - start)
       end
(500000000500000000, 1000)

Unfortunately there is an issue with the @time macro used in global scope
that causes the timing to be inaccurate. This bug needs to be resolved in
Base Julia. See this issue for details.

Takeaway: things can change as software evolves; always make sure to
explicitly tell your readers which version and configuration of software they
should use to run your code.

Factual errors

This one is most problematic, as I really feel bad, when I find that I have
written something that is not fully correct. Let me give you an example
from Chapter 2, section 2.3.1, page 30.

The Julia Manual in the Short Circuit Evaluation section states:

Instead of if <cond> <statement> end, one can write <cond> && <statement>
(which could be read as: <cond> and then <statement>).
Similarly, instead of if if ! <cond> <statement> end, one can write
<cond> || <statement> (which could be read as: <cond> or else <statement>).

Similarly, in my book I have considered the following expressions:

x > 0 && println(x)

and

if x > 0
    println(x)
end

where x = -7.

I write in the book that Julia interprets them both in the same way.

Indeed it is true that the same expressions get evaluated in both cases.
However, in general if statement and doing short-circuit evaluation are
not equivalent.

What is the difference? If the condition would be false then the value of if
statement (without else) is nothing and the value of the expression using
short-circuiting is false. Here is an example:

julia> x = -7
-7

julia> show(x > 0 && println(x))
false
julia> show(if x > 0
           println(x)
       end)
nothing

The difference is subtle, but in cases when you would use the produced
value later in your code it could become important.

Takeaway: always carefully consider all aspects of code that you present.

Conclusions

As a conclusion I would like to ask all readers of my book to share with me
your feedback. I will try to incorporate it in the next release of the book
in the best way I can.

Metal.jl 0.2: Metal Performance Shaders

By: Tim Besard

Re-posted from: https://juliagpu.org/post/2023-03-03-metal_0.2/index.html

Metal.jl 0.2 marks a significant milestone in the development of the Metal.jl package. The release comes with initial support for the Metal Perform Shaders (MPS) framework for accelerating common operations like matrix multiplications, as well as various improvements for writing Metal kernels in Julia.

Metal Performance Shaders

Quoting the Apple documentation, The Metal Performance Shaders (MPS) framework contains a collection of highly optimized compute and graphics shaders for use in Metal applications. With Metal.jl 0.2, we have added initial support for this framework, and used it to accelerate the matrix multiplication operation:

julia> n = p = m = 2048
julia> flops = n*m*(2p-1)
17175674880julia> a = MtlArray(rand(Float32, n, p));
julia> b = MtlArray(rand(Float32, p, m));
julia> c = MtlArray(zeros(Float32, n, m));julia> bench = @benchmark Metal.@sync mul!(c, a, b)
BenchmarkTools.Trial: 518 samples with 1 evaluation.
 Range (min … max):  9.366 ms …  13.354 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.629 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.646 ms ± 192.169 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%               ▃▂▅▅▆▆▆▇█▇▇▆▅▄▄▁▁ ▁
  ▄▁▄▄▄▄▆▆▆▄▄▁▇█████████████████▄█▄▁▆▁▄▁▆▁▇▁▄▄▁▁▄▄▇▁▄▆▄▁▁▁▁▁▄ █
  9.37 ms      Histogram: log(frequency) by time      10.1 ms < Memory estimate: 352 bytes, allocs estimate: 12.julia> flops / (minimum(bench.times)/1e9)
1.83e12

The benchmark above shows that on an 8-core M1 Pro matrix multiplication now reaches 1.8 TFLOPS (out of the 2.6TFLOPS of theoretical performance). The accelerated matrix multiplication is available for a variety of input types, incuding mixed-mode operations, and as shown above is integrated with the LinearAlgebra.jl mul! interface.

Of course, the MPS framework offers more than just matrix multiplication, and we expect to support more of it in the future. If you have a specific operation you would like to use from Julia, please let us know by opening an issue on the Metal.jl repository.

GPU profiling support

To support the development of Metal kernels, Max Hawkins has added support for GPU profiling. Similar to how this works in CUDA.jl, you can run code under the Metal.@profile macro to record its execution. However, this does first require setting the METAL_CAPTURE_ENABLED environment flag before import Metal.jl:

julia> ENV["METAL_CAPTURE_ENABLED"] = 1julia> using Metaljulia> a = mtl(rand(1024, 1024))
julia> Metal.@profile sum(a)
[ Info: GPU frame capture saved to jl_metal.gputrace/

The resulting capture can be opened with Xcode, presenting a timeline that's similar to other profilers:

Other improvements

Julia 1.9 is supported, but requires an up-to-date macOS version (issues have been encountered on macOS 12.4);
An mtl function has been added for converting Julia arrays to Metal arrays, similar to the cu function in CUDA.jl;
Multiple GPUs are supported, and the device! function can be used to select one;
Coverage for SIMD Group functions has been improved, so it's is now possible to use simdgroup_load, simdgroup_store, simdgroup_multiply, and simdgroup_multiply_accumulate in kernels functions.

Future work

Although Metal.jl is now usable for a variety of applications, there is still work to be done before it can be considered production-ready. In particular:

there are known performance issues with mapreduce, and other operations that realy on CartesianIndices;
the libcmt wrapper library for interfacing with the Metal APIs is cumbersome to use and improve, and we are looking into native ObjectiveC FFI instead;
the MPS wrappers are incomplete, and similar to the Metal APIs requires a replacement to libcmt to be improved;
support for atomic operations is missing, which is required to implement a full-featured KernelAbstractions.jl back-end.

Once (most of) these issues are addressed, we should be able to release Metal.jl 1.0.

More Extreme Moves in a Downtrending Market?

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2023/03/01/Downtrend-Extremes.html

I’m exploring whether there are more extreme moves in the equity
market when we are in a downtrend. I think anecdotally we
all notice these big red numbers when the market has been grinding
lower over a period as it is the classic loss aversion of human
psychology. This is loosely related to my Ph.D. in point processes and
also a blog post from last year when I investigated trend following –
Trend Following with ETFs. I’m
going to take two approaches, a simple binomial model and a Hawkes
process. For the data, we will be pulling the daily data from Alpaca Markets using my AlpacaMarkets.jl package.

Enjoy these types of posts? Then you should sign up for my newsletter. It’s a short monthly recap of anything and everything I’ve found interesting recently plus
any posts I’ve written. So sign up and stay informed!

A few packages to get started and I’m running Julia 1.8 for this
project.

using AlpacaMarkets
using DataFrames, DataFramesMeta
using Dates
using Plots, PlotThemes, StatsPlots
using RollingFunctions, Statistics, StatsBase
using GLM

All good data analysis starts with the data. I’m downloading the
daily statistics of SPY the S&P 500 stock index ETF which will represent
the overall stock market.

function parse_date(t)
   Date(string(split(t, "T")[1]))
end

function clean(df, x) 
    df = @transform(df, :Date = parse_date.(:t), :Ticker = x, :NextOpen = [:o[2:end]; NaN])
   @select(df, :Date, :Ticker, :c, :o, :NextOpen)
end

spyPrices = stock_bars("SPY", "1Day"; startTime = now() - Year(10), limit = 10000, adjustment = "all")[1]
spyPrices = clean(spyPrices, "SPY")
last(spyPrices, 3)

3×5 DataFrame

Row	Date	Ticker	c	o	NextOpen
	Date	String	Float64	Float64	Float64
1	2023-02-22	SPY	398.54	399.52	401.56
2	2023-02-23	SPY	400.66	401.56	395.42
3	2023-02-24	SPY	396.38	395.42	NaN

I’m doing the usual close-to-close returns and then taking the 100-day moving average as my trend signal.

spyPrices = @transform(spyPrices, :Return = [missing; diff(log.(:c))])
spyPrices = @transform(spyPrices, :Avg = lag(runmean(:Return, 100), 1))
spyPrices = @transform(spyPrices, :BigMove = abs.(:Return) .>= 0.025)
dropmissing!(spyPrices);

sp = scatter(spyPrices[spyPrices.BigMove, :].Date, spyPrices[spyPrices.BigMove, :].Return, legend = :none)
sp = scatter!(sp, spyPrices[.!spyPrices.BigMove, :].Date, spyPrices[.!spyPrices.BigMove, :].Return)

plot(sp, plot(spyPrices.Date, spyPrices.Avg), layout = (2,1), legend=:none)

Daily big moves and trend signal

By calling a ‘big move’ anything greater than \(\pm\) 0.025 (in log terms)
we can see that they, the blue dots, are slightly clustered around
common periods. In the plot below, the 100-day rolling average of
the returns, our trend signal, also appears to be slightly correlated with these big returns.

scatter(spyPrices.Avg, abs.(spyPrices.Return), label = :none,
            xlabel = "Trend Signal", ylabel = "Daily Return")

Trend signal vs daily return

Here we have the 100-day rolling average on the x-axis and the
absolute return on that day on the y-axis. If we squint a little we
can imagine there is a slight quadratic pattern, or at the least, these
down trends appear to correspond with the more extreme day moves. We
want to try and understand if this is a significant effect.

A Binomial Model

We will start by looking at the probability that each day might
have a ‘large move’. We first split into a train/test split of 70/30.

trainData = spyPrices[1:Int(floor(nrow(spyPrices)*0.7)), :]
testData = spyPrices[Int(ceil(nrow(spyPrices)*0.7)):end, :];

The GLM.jl package lets you write out the formula and fit a wide
variety of linear models. We have two models, the proper one that uses
the Avg column (our trend signal) as our features and a null model
that just fits an intercept.

binomialModel = glm(@formula(BigMove ~ Avg + Avg^2), trainData, Binomial())
nullModel = glm(@formula(BigMove ~ 1), trainData, Binomial())

spyPrices[!, :Binomial] = predict(binomialModel, spyPrices);

To look at the model we can plot the output of the model relative to
the signal at the time.

plot(scatter(spyPrices.Avg, spyPrices[!, :Binomial], label ="Response Function"), 
    plot(spyPrices.Date, spyPrices[!, :Binomial], label = "Probability of a Large Move"), layout = (2,1))

From the top graph, we see the higher probability of an extreme move comes from when the moving average is a large negative number. The probability then flatlines beyond zero, which suggests there isn’t that much of an effect for large moves when the momentum in the market is positive.

We also plot the daily probability of a large move and see that it
has been pretty bad in the few months lots of big moves!

Binomial Intensity

We need to check if the model is any good though. We will just check
the basic accuracy.

using Metrics
binary_accuracy(predict(binomialModel, testData), testData.BigMove)
binary_accuracy(predict(nullModel)[1] .* ones(nrow(testData)), testData.BigMove)

93

95

So the null model has an accuracy of 95% on the test set, but the
fitted model has an accuracy of 93%. Not good, looks like the trend
signal isn’t adding anything. We might be able to salvage the model
with a robust windowed fit and test procedure or look at a
single stock name but overall, I think it’s more of a testament to how
hard it is to model this data rather than anything too specific.

We could also consider the self-exciting nature of these large moves. If one happens, is there a higher probability of another happening? Given my Ph.D. was in Hawkes processes, I have done lots of writing around them before and this is just another example of how they can be applied.

Hawkes Processes

Hawkes processes! The bane of my life for four years. Still, I am
forever linked with them now so might as well put that Ph.D. to use. If
you haven’t come across Hawkes processes before it is a self-exciting
point process where the occurrence of one event can lead to further
events. In our case, this means one extreme event can cause further
extreme events, something we are trying to use the downtrend to
predict. With the Hawkes process, we are checking whether the events
are just self-correlated.

I’ve built the HawkesProcesses.jl package to make it easy to work
with Hawkes processes.

using HawkesProcesses, Distributions

Firstly, we get the data in the right shape by pulling the number of
days since the start of the data of each big event.

startDate = minimum(spyPrices.Date)
allEvents = getfield.(spyPrices[spyPrices.BigMove, :Date] .- startDate, :value);
allDatesNorm = getfield.(spyPrices.Date .- startDate, :value);
maxT = getfield.(maximum(spyPrices[spyPrices.BigMove, :Date]) .- startDate, :value)

We then fit the Hawkes process using the standard Bayesian method for
5,000 iterations.

bgSamps1, kappaSamps1, kernSamps1 = HawkesProcesses.fit(allEvents .+ rand(length(allEvents)), maxT, 5000)
bgSamps2, kappaSamps2, kernSamps2 = HawkesProcesses.fit(allEvents .+ rand(length(allEvents)), maxT, 5000)

bgEst = mean(bgSamps1[2500:end])
kappaEst = mean(kappaSamps1[2500:end])
kernEst = mean(kernSamps1[2500:end])

intens = HawkesProcesses.intensity(allDatesNorm, allEvents, bgEst, kappaEst, Exponential(1/kernEst));
spyPrices[!, :Intensity] = intens;

We get three parameters out of the Hawkes process. The background rate
\(\mu\), the self-exciting parameter \(\kappa\) and an exponential
parameter that describes how long each event has an impact on the
probability of another event, \(\beta\).

(bgEst, kappaEst, kernEst)

(0.005, 0.84, 0.067)

We get \(\kappa = 0.84\) and \(\beta = 0.07\) which we can interpret
as a high probability that another large move follows and that takes
around 14 days (business days) to decay. So with each large move,
expect another large move within 3 weeks.

When we compare the Hawkes intensity to the previous binomial
intensity we get a similar shape between both models.

plot(spyPrices.Date, spyPrices.Binomial, label = "Binomial")
plot!(spyPrices.Date, intens, label = "Hawkes")

Binomial and Hawkes Intensity

They line up quite well, which is encouraging and shows they are on a similar path. If we zoom in specifically to 2022.

plot(spyPrices[spyPrices.Date .>= Date("2022-01-01"), :].Date, 
     spyPrices[spyPrices.Date .>= Date("2022-01-01"), :].Binomial, label = "Binomial")
plot!(spyPrices[spyPrices.Date .>= Date("2022-01-01"), :].Date, 
      spyPrices[spyPrices.Date .>= Date("2022-01-01"), :].Intensity, label = "Hawkes")

2022 Intensity

Here we can see the binomial intensity stays higher for longer whereas
the Hawkes process goes through quicker bursts of intensity. This is
intuitive as the binomial model is using a 100-day moving average
under the hood, whereas the Hawkes process is much more reactive to
the underlying events.

To check whether the Hawkes process is any good we compare its
likelihood to a null likelihood of a constant Poisson process.

We first fit the null point process model by optimising the
null_likelihood across the events.

using Optim
null_likelihood(events, lambda, maxT) = length(events)*log(lambda) - lambda*maxT

opt = optimize(x-> -1*null_likelihood(allEvents, x[1], maxT),  0, 10)
Optim.minimizer(opt)

0.031146179404103084

Which gives a likelihood of:

null_likelihood(allEvents, Optim.minimizer(opt), maxT)

-335.1797769669301

Whereas the Hawkes process has a likelihood of:

likelihood(allEvents, bgEst, kappaEst, Exponential(1/kernEst), maxT)

-266.63091365640366

A substantial improvement, so all in the Hawkes process looks pretty
good.

Overall, the Hawkes model subdues quite quickly, but the binomial model can remain elevated. They are covering two different behaviours. The Hawkes model can describe what happens after one of these large moves happens. The binomial model is mapping the momentum onto a probability of a large event.

How do we combine both the binomial and the Hawkes process model?

Point Process Model

To start with, we need to consider a point process with variable intensity. This is known as an inhomogeneous point process. In our case, these events depend on the value of the trend signal.

\[\lambda (t) \propto \hat{r} (t)\]

\[\lambda (t) = \beta _0 + \beta _1 \hat{r} (t) + \beta_2 \hat{r} ^2 (t)\]

Like the binomial model, we will use a quadratic combination of the values. Then, given we know how to write the likelihood for a point process, we can do some maximum likelihood estimation to find the appropriate parameters.

Our rhat function need to return the signal at a given time.

function rhat(t, spy)
    dt = minimum(spy.Date) + Day(Int(floor(t)))
    spy[spy.Date .<= dt, :Avg][end]
end

And our likelihood which uses the rhat function, plus making it compatible with arrays.

function lambda(t, params, spy)
   exp(params[1] + params[1] * rhat(t, spy) + params[2] * rhat(t, spy) * rhat(t, spy))
end

lambda(t::Array{<:Number}, params::Array{<:Number}, spy::DataFrame) = map(x-> lambda(x, params, spy), t)

The likelihood of a point process is

\[\mathcal{L} = \sum _{t_i} log(\lambda (t_i)) – \int _0 ^T \lambda (t) \mathrm{d}\]

We have to use numerical integration to do the second half of the equation which is where the QuadGK.jl package comes in. We pass it a function and it will do the integration for us. Job done!

function likelihood(params, rate, events, maxT, spy)
    sum(log.(rate(events, params, spy))) - quadgk(t-> rate(t, params, spy), 0, maxT)[1]
end

With all the functions ready we can optimise and find the correct parameters.

using Optim, QuadGK
opt = optimize(x-> -1*likelihood(x, lambda, allEvents, maxT, spyPrices), rand(3))
Optim.minimizer(opt)

3-element Vector{Float64}:
 -3.4684622926014783
  1.6204408269570916
  2.902098418452392

This also has a maximum likelihood of -334. Which if you scroll up
isn’t much better compared to the null model. So warning bells should
be ringing that this isn’t a good model.

plot(minimum(spyPrices.Date) + Day.(Int.(collect(0:maxT))), 
     lambda(collect(0:maxT), Optim.minimizer(opt), spyPrices), label = :none,
     title = "Poisson Intensity")

Poisson Intensity

The intensity isn’t showing too much structure over time.

To check the fit of this model we simulate some events with the same
intensity pattern.

lambdaMax = maximum(lambda(collect(0:0.1:maxT), Optim.minimizer(opt), spyPrices)) * 1.1
rawEvents = rand(Poisson(lambdaMax * maxT), 1)[1]
unthinnedEvents = sort(rand(Uniform(0, maxT), rawEvents))
acceptProb = lambda(unthinnedEvents, Optim.minimizer(opt), spyPrices) / lambdaMax
events = unthinnedEvents[rand(length(unthinnedEvents)) .< acceptProb];
histogram(events,label= "Simulated", bins = 100)
histogram!(allEvents, label = "True", bins = 100)

Simulated Events

It’s not a great model as the simulated events don’t line up with the
true events. Looking back at the intensity function we can see it
doesn’t vary much around 0.03, so whilst the intensity function looks
varied, zooming out shows it is quite flat.

Next Steps

I wanted to integrate the variable background into the Hawkes process
so we could combine both models. As my Hawkes sampling is Bayesian I
have an old blog post to turn the above from an MLE to full Bayesian
estimation, but that code doesn’t work anymore. You need to use the
LogDensityProblems.jl package to get it working, so I’m going to
have to invest some time in learning that. I’ll be honest, I’m not
sure how bothered I can be, I’ve got a long list of other things I
want to explore and learning some abstract interface doesn’t feel like
it’s a good use of my time. Frustrating because the whole point of
Julia is composability, I could write a pure Julia function and
use HCMC on it, but now I’ve got to get another package involved. I’m sure
there is good reason and the LogDensityProblems package solves some issues but it feels a bit like the Javascript ecosystem where everything changes and the way to do something is outdated the minute it is pushed to main.

Conclusion

So overall we’ve shown that the large moves don’t happen more often in
down-trending markets, at least in the broad S&P500 view of the market.
Both a binomial and point process model showed no improvement on a
null model for predicting these extreme days whereas the Hawkes model shows that they are potentially self-exciting.

juliabloggers.com

A Julia Language Blog Aggregator