Is Makie.jl up to speed?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/12/01/plot.html

Introduction

Makie is a plotting ecosystem for the Julia language that is extremely feature-packed and actively developed.
Recently its core package Makie.jl reached version 0.20. Its core developer Simon told me that
the package now loads much faster than it was the case in the past.

The “time to first plot” issue is often raised by new users of the Julia ecosystem as important.
Therefore a lot of work was put both by Julia core developers and by package maintainers to reduce it.

In this post I thought that it would be interesting to check how CairoMakie.jl compares to Plots.jl.
The Plots.jl package is another great plotting ecosystem for Julia. It is more lightweight, so in the past it was seen
as faster, but less feature rich. Let us see how the situation stands currently.
From the Makie ecosystem I have chosen CairoMakie.jl as I typically need 2-D production-quality plots.

The code in this post was tested under Julia Version 1.10.0-rc1, CairoMakie.jl 0.11.2 and Plots.jl 1.39.0.

Installing the package

This, and the following tests, are done in separate Julia sessions in separate project environments
for both plotting ecosystems.

The first timing we make is package installation. The results are the following:

  • CairoMakie.jl 0.11.2: 241 dependencies successfully precompiled in 227 seconds
  • Plots.jl v1.39.0: 62 dependencies successfully precompiled in 78 seconds

CairoMakie.jl takes three times more time to install and has four times more dependencies.
Installation is one-time cost, however, there are two considerations to keep in mind:

  • New users are first faced with installation time so it is noticeable (this is especially relevant with Pluto.jl).
  • Since CairoMakie.jl has many more dependencies it is more likely that it will require recompilation when any of them gets updated.

Loading the package

After installing packages we can check how long it takes to load them:

julia> @time using CairoMakie
  3.580779 seconds (2.75 M allocations: 181.590 MiB, 4.39% gc time, 1.71% compilation time: 49% of which was recompilation)

vs

julia> @time using Plots
  1.296026 seconds (1.05 M allocations: 73.541 MiB, 6.00% gc time, 2.19% compilation time)

CairoMakie.jl takes around three times more time to load. This difference is noticeable, but I think it is not a show-stopper in most cases.
Having to wait 3.5 seconds for a package to load should be acceptable unless someone expects to run a really short-lived Julia session.

Simple plotting

Now comes the time to compare plotting time. Start with CairoMakie.jl:

julia> x = range(0, 10, length=100);

julia> y = sin.(x);

julia> @time lines(x, y)
  0.559800 seconds (476.77 k allocations: 32.349 MiB, 3.13% gc time, 94.66% compilation time)

julia> @time lines(x, y)
  0.012473 seconds (32.40 k allocations: 2.128 MiB)

vs Plots.jl:

julia> x = range(0, 10, length=100);

julia> y = sin.(x);

julia> @time plot(x, y)
  0.082866 seconds (9.16 k allocations: 648.188 KiB, 97.32% compilation time)

julia> @time plot(x, y)
  0.000508 seconds (484 allocations: 45.992 KiB)

The situation repeats. CairoMakie.jl is visibly slower, but having to wait 0.5 second for a first plot to be sent to the plotting engine is I think acceptable.
Note that the consecutive plots are much faster as they do not require compilation.

Conclusions

Given the timings I have gotten my judgment is as follows:

  • CairoMakie.jl is still visibly slower than Plots.jl.
  • Yet, CairoMakie.jl is in my opinion currently fast enough not to annoy users by requiring them to wait excessively long for a plot.

I think Makie maintainers, in combination with core Julia developers,
have done a fantastic job with improving time-to-first-plot in this ecosystem.

I can say that I decided to switch to Makie as my default plotting tool for larger projects.
However, I will probably for now still use Plots.jl in scenarios when I just want to start Julia and do a single quick plot
(especially on a machine where it has to be yet installed).

Leverage an Analytic Approach to Improve CMS Star Rating Performance

By: Jeff Dixon

Re-posted from: https://glcs.hashnode.dev/strategies-for-tracking-and-improving-cms-star-rating-performance

The CMS Star Rating program was developed to raise the quality of services provided to Medicare Advantage (MA) plan members and to empower consumers to select the highest quality plans. As the rating system continues to evolve, it is becoming increasingly difficult for providers and insurers to maintain and improve their ratings due to a variety of factors:

  • Retroactive guidelines are applied at the end of the year.

  • Annual changes to quality measures.

  • Annual changes to ratings cut point thresholds.

  • Changes in measure and category weighting factors (1x to 5x).

Plan administrators must deal with these challenges while providing great service to members and performing above average against other MA plans across 40 different quality metrics!

More specific information on the Star Ratings program can be found here:

https://www.cms.gov/stars-fact-sheet-2024.pdf

Why should plan administrators focus on Star Ratings?

Achieving an above average Star rating (4-star or above) is important for Medicare Advantage plans for many reasons. Here are a few statistics to highlight my point:

  • Only 42% of MA-PD contracts earned a 4-star rating (or above) in the 2024 ratings year. This is a 9% decrease from the 2023 ratings!

  • 74% of MA-PD enrollees will select contracts with 4 or more stars in 2024.

  • Roughly $9 Billion in bonus payments are distributed annually to high-performing plans – an average of $417 per enrolled member.

Introducing the GLCS MA Stars Analytics Platform

To give providers and plan administrators the insights they need to track, forecast and improve Star ratings, Great Lakes Consulting Services (GLCS) has developed an MA Stars Analytics Platform.

Core Principals

The platform was built on two core principles. First, success in Stars requires a consolidated view of quality metrics throughout each plan year. Second, a successful platform must provide plan administrators with a prospective view of ratings based on statistical and scenario modeling.

The GLCS platform is built on a dimensional data model to account for the flexibility required to track annual changes to measures, weights and cut-point thresholds. The model is informed by 10+ years of published Star rating results and paired with a set of prebuilt analytics and Tableau dashboards.

We aim to empower plan administrators, with just a few clicks, to review measure and rating projections, view historical trends by category, and spotlight areas for improvement.

The platform equips users with actionable insights needed to strategically target resources, engage clinicians, and close gaps in member care to improve health outcomes. Through data visualizations, advanced analytics and predictive modeling, we want to help plan administrators focus on the highest impact measures and take the guesswork out of year-end ratings results.

Key Analytic Categories

The GLCS platform leverages 10+ years of published rating results from CMS to predict cut-point thresholds and project ratings for each applicable quality measure. For convenience, the results are summarized from 8 Health Plan and Drug Plan categories into 5 measurement groupings and aligned to the Stars rating system:

Clinical HEDISMember Perception

Clinical PharmacyImprovement

Operations

To learn more about the GLCS Stars Analytics Platform, please follow the link below:

https://www.glcs.io/2023/11/GLCS_Stars_Analytics.pdf

Why GLCS?

Before our watershed partnership with JuliaHub (formerly Julia Computing) and our foray into Scientific Modeling, Simulation and Machine Learning, Great Lakes Consulting was formed from a renegade band of data wranglers. We tackled problems large and small and helped clients with data-driven solutions, including:

  • Enterprise Revenue Modeling

  • AR Valuation

  • Revenue Variance Analysis

  • Denial Analytics

  • Pricing Optimization

  • Population Health Risk Modeling

We thrive on the numerous challenges in the Healthcare space. It is always rewarding to wrestle and conform disparate data to create actionable insights and meaningful outcomes. This is still at the core of everything we do. That said, there is no bigger data challenge than MA Stars.

Looking to the Future

CMS is continuing to advance the Star ratings program. Achieving above-average Star ratings will be progressively more difficult in the upcoming years. In the measurement year 2024, CMS will:

  • Introduce new quality measures (Improving Physical & Mental Health).

  • Change the weighting factors on existing measures.

  • Being using online surveys for participants.

  • Replace Reward Factors with a Health Equity Index (HEI).

I think we are up for the challenge. We will strive to advance our platform and make every effort to help providers and plan administrators achieve and maintain high star ratings. I look forward to more blog posts to share our experiences along the way.

Free Finance Data Sets for the Quants

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2023/11/25/Free-Finance-Datasets-for-Quants.html

Now and then I am asked how to get started in quant finance and
my advice has always been to just get hold of some data and play about
with different models. The first step is to get some data and this post takes you
through several different sources and hopefully gives you the
launchpad to start poking around with financial data.


Enjoy these types of posts? Then you should sign up for my newsletter.


I’ve tried to cover different assets and frequencies to hopefully
inspire the various types of quant finance
out there.

High-Frequency FX Market Data

My day-to-day job is in FX so naturally, that’s where I think all the
best data can be found. TrueFX provides
tick-by-tick in milliseconds, so high-frequency data is available for free and across lots of different currencies.
So if you are interested in working out how to deal with large amounts
of data (1 month of EURUSD is 600MB) efficiently, this source is a
good place to start.

As a demo, I’ve downloaded the USDJPY October dataset.

using CSV, DataFrames, DataFramesMeta, Dates, Statistics
using Plots

It’s a big CSV file, so this isn’t the best way to store the data,
instead, stick it into a database like QuestDB
that are made for time series data.

usdjpy = CSV.read("USDJPY-2023-10.csv", DataFrame,
                 header = ["Ccy", "Time", "Bid", "Ask"])
usdjpy.Time = DateTime.(usdjpy.Time, dateformat"yyyymmdd HH:MM:SS.sss")
first(usdjpy, 4)
4×4 DataFrame
Row Ccy Time Bid Ask
String7 DateTime Float64 Float64
1 USD/JPY 2023-10-01T21:04:56.931 149.298 149.612
2 USD/JPY 2023-10-01T21:04:56.962 149.298 149.782
3 USD/JPY 2023-10-01T21:04:57.040 149.589 149.782
4 USD/JPY 2023-10-01T21:04:58.201 149.608 149.782

It’s simple data, just a bid and ask price with a time stamp.

usdjpy = @transform(usdjpy, :Spread = :Ask .- :Bid, 
                            :Mid = 0.5*(:Ask .+ :Bid), 
                            :Hour = round.(:Time, Minute(10)))

usdjpyHourly = @combine(groupby(usdjpy, :Hour), :open = first(:Mid), :close = last(:Mid), :avg_spread = mean(:Spread))
usdjpyHourly.Time = Time.(usdjpyHourly.Hour)

plot(usdjpyHourly.Hour, usdjpyHourly.open, lw =1, label = :none, title = "USDJPY Price Over October")

Looking at the hourly price over the month gives you flat periods
over the weekend.

USDJPY October price chart

Let’s look at the average spread (ask – bid) throughout the day.

hourlyAvgSpread = sort(@combine(groupby(usdjpyHourly, :Time), :avg_spread = mean(:avg_spread)), :Time)

plot(hourlyAvgSpread.Time, hourlyAvgSpread.avg_spread, lw =2, title = "USDJPY Intraday Spread", label = :none)

USDJPY average intraday spread

We see a big spike at 10 pm because of the day roll and the
secondary markets go offline briefly, which pollutes the data
bit. Looking at just midnight to 8 pm gives a more indicative picture.

plot(hourlyAvgSpread[hourlyAvgSpread.Time .<= Time("20:00:00"), :].Time, 
     hourlyAvgSpread[hourlyAvgSpread.Time .<= Time("20:00:00"), :].avg_spread, label = :none, lw=2,
     title = "USDJPY Intraday Spread")

USDJPY average intraday spread zoomed

In October spreads have generally been wider in the later part of the
day compared to the morning.

There is much more that can be done with this data across the
different currencies though. For example:

  1. How stable are correlations across currencies at different time frequencies?
  2. Can you replicate my microstructure noise post? How does the microstructure noise change between currencies
  3. Price updates are irregular, what are some statistical properties?

Daily Futures Market Data

Let’s zoom out a little bit now, decrease the frequency, and widen the
asset pool. Futures cover many asset classes, oil, coal, currencies,
metals, agriculture, stocks, bonds, interest rates, and probably
something else I’ve missed. This data is daily and roll adjusted, so
you have a continuous time series of an asset for many years. This means you can look at the classic momentum/mean reversion portfolio models and have a real stab at long-term trends.

The data is part of the Nasdaq data link product (formerly Quandl)
and once you sign up for an account you have access to the free
data. This futures dataset is
Wiki Continuous Futures
and after about 50 clicks and logging in, re-logging in, 2FA codes
you can view the pages.

To get the data you can go through one of the API packages in
your favourite language. In Julia, this means the QuandlAccess.jl
package which keeps things simple.

using QuandlAccess

futuresMeta = CSV.read("continuous.csv", DataFrame)
futuresCodes = futuresMeta[!, "Quandl Code"] .* "1"

quandl = Quandl("QUANDL_KEY")

function get_data(code)
    futuresData = quandl(TimeSeries(code))
    futuresData.Code .= code
    futuresData
end
futureData = get_data.(rand(futuresCodes, 4));

We have an array of all the available contracts futuresCodes and
sample 4 of them randomly to see what the data looks like.

p = []
for df in futureData
    append!(p, plot(df.Date, df.Settle, label = df.Code[1]))
end

plot(plot.(p)..., layout = 4)

Futures examples

  • ABY – WTI Brent Bullet – Spread between two oil futures on different
    exchanges.
  • TZ6 – Transco Zone 6 Non-N.Y. Natural Gas (Platts IFERC) Basis – Spread between
    two different natural gas contracts
  • PG – PG&E Citygate Natural Gas (Platts IFERC) Basis – Again, spread between
    two different natural gas contracts
  • FMJP – MSCI Japan Index – Index containing Japanese stocks

I’ve managed to randomly select 3 energy futures and one stock
index.

Project ideas with this data:

  1. Cross-asset momentum and mean reversion.
  2. Cross-asset correlations, does the price of oil drive some equity indexes?
  3. Macro regimes, can you pick out commonalities of market factors over
    the years?

Equity Order Book Data

Out there in the wild is the FI2010 dataset which is essentially a
sample of the full order book for five different
stocks on the Nordic stock exchange for 10 days. You have 10 levels of
prices and volumes and so can reconstruct the order book throughout the
day. It is the benchmark dataset for limit order book prediction and you will see it referenced
in papers that are trying to implement new prediction models. For
example Benchmark Dataset for Mid-Price Forecasting of Limit
Order Book Data with Machine Learning Methods

references some basic methods on the dataset and how they perform when
predicting the mid-price.

I found the dataset (as a Python package) here
https://github.com/simaki/fi2010 but it’s just stored as a CSV which
you can lift easily.

fi2010 = CSV.read(download("https://raw.githubusercontent.com/simaki/fi2010/main/data/data.csv"),DataFrame);

Update on 7/01/2024

Since posting this the above link has gone offline and the user has
deleted their Github account! Instead the data set can be found here:
https://etsin.fairdata.fi/dataset/73eb48d7-4dbc-4a10-a52a-da745b47a649/data
. I’ve not verified if its in the same format, so there might be some
additional work going from the raw data to how this blog post sets it
up. Thank’s to the commentators below pointing this out.

The data is wide (each column is a depth level of the price and
volume) so I turn each into a long data set and add the level, side
and variable as a new column.

fi2010Long = stack(fi2010, 4:48, [:Column1, :STOCK, :DAY])
fi2010Long = @transform(fi2010Long, :a = collect.(eachsplit.(:variable, "_")))
fi2010Long = @transform(fi2010Long, :var = first.(:a), :level = last.(:a), :side = map(x->x[2], :a))
fi2010Long = @transform(groupby(fi2010Long, [:STOCK, :DAY]), :Time = collect(1:length(:Column1)))
first(fi2010Long, 4)

The ‘book depth’ is the sum of the liquidity available at all the
levels and indicates how easy it is to trade the stock. As a
quick example, we can take the average of each stock per day and use
that as a proxy for the ease of trading these stocks.

intraDayDepth = @combine(groupby(fi2010Long, [:STOCK, :DAY, :var]), :avgDepth = mean(:value))
intraDayDepth = @subset(intraDayDepth, :var .== "VOLUME");
plot(intraDayDepth.DAY, intraDayDepth.avgDepth, group=intraDayDepth.STOCK, 
     marker = :circle, title = "Avg Daily Book Depth - FI2010")

FI2010 Intraday book depth

Stock 3 and 4 have the highest average depth, so most likely the
easier to trade, whereas Stock 1 has the thinnest depth. Stock 2 has
an interesting switch between liquid and not liquid.

So if you want to look beyond top-of-book data, this dataset provides
the extra level information needed and is closer to what a
professional shop is using. Better than trying to predict daily Yahoo
finance mid-prices with neural nets at least.

Build Your Own Crypto Datasets

If you want to take a further step back then being able to build the
tools that take in streaming data directly from the exchanges and
save that into a database is another way you can build out your
technical capabilities. This means you have full control over what you
download and save. Do you want just the top of book every update, the
full depth of the book, or just the reported trades?
I’ve written about this before, Getting Started with High Frequency Finance using Crypto Data and Julia, and learned a lot in the
process. Doing things this way means you have full control over the entire
process and can fully understand the data you are saving and any
additional quirks around the process.

Conclusion

Plenty to get stuck into and learn from. Being able to get the data
and loading it into an environment is always the first challenge and
learning how to do that with all these different types of data should
help you understand what these types of jobs entail.