Category Archives: Julia

Leverage an Analytic Approach to Improve CMS Star Rating Performance

By: Jeff Dixon

Re-posted from: https://glcs.hashnode.dev/strategies-for-tracking-and-improving-cms-star-rating-performance

The CMS Star Rating program was developed to raise the quality of services provided to Medicare Advantage (MA) plan members and to empower consumers to select the highest quality plans. As the rating system continues to evolve, it is becoming increasingly difficult for providers and insurers to maintain and improve their ratings due to a variety of factors:

  • Retroactive guidelines are applied at the end of the year.

  • Annual changes to quality measures.

  • Annual changes to ratings cut point thresholds.

  • Changes in measure and category weighting factors (1x to 5x).

Plan administrators must deal with these challenges while providing great service to members and performing above average against other MA plans across 40 different quality metrics!

More specific information on the Star Ratings program can be found here:

https://www.cms.gov/stars-fact-sheet-2024.pdf

Why should plan administrators focus on Star Ratings?

Achieving an above average Star rating (4-star or above) is important for Medicare Advantage plans for many reasons. Here are a few statistics to highlight my point:

  • Only 42% of MA-PD contracts earned a 4-star rating (or above) in the 2024 ratings year. This is a 9% decrease from the 2023 ratings!

  • 74% of MA-PD enrollees will select contracts with 4 or more stars in 2024.

  • Roughly $9 Billion in bonus payments are distributed annually to high-performing plans – an average of $417 per enrolled member.

Introducing the GLCS MA Stars Analytics Platform

To give providers and plan administrators the insights they need to track, forecast and improve Star ratings, Great Lakes Consulting Services (GLCS) has developed an MA Stars Analytics Platform.

Core Principals

The platform was built on two core principles. First, success in Stars requires a consolidated view of quality metrics throughout each plan year. Second, a successful platform must provide plan administrators with a prospective view of ratings based on statistical and scenario modeling.

The GLCS platform is built on a dimensional data model to account for the flexibility required to track annual changes to measures, weights and cut-point thresholds. The model is informed by 10+ years of published Star rating results and paired with a set of prebuilt analytics and Tableau dashboards.

We aim to empower plan administrators, with just a few clicks, to review measure and rating projections, view historical trends by category, and spotlight areas for improvement.

The platform equips users with actionable insights needed to strategically target resources, engage clinicians, and close gaps in member care to improve health outcomes. Through data visualizations, advanced analytics and predictive modeling, we want to help plan administrators focus on the highest impact measures and take the guesswork out of year-end ratings results.

Key Analytic Categories

The GLCS platform leverages 10+ years of published rating results from CMS to predict cut-point thresholds and project ratings for each applicable quality measure. For convenience, the results are summarized from 8 Health Plan and Drug Plan categories into 5 measurement groupings and aligned to the Stars rating system:

Clinical HEDISMember Perception

Clinical PharmacyImprovement

Operations

To learn more about the GLCS Stars Analytics Platform, please follow the link below:

https://www.glcs.io/2023/11/GLCS_Stars_Analytics.pdf

Why GLCS?

Before our watershed partnership with JuliaHub (formerly Julia Computing) and our foray into Scientific Modeling, Simulation and Machine Learning, Great Lakes Consulting was formed from a renegade band of data wranglers. We tackled problems large and small and helped clients with data-driven solutions, including:

  • Enterprise Revenue Modeling

  • AR Valuation

  • Revenue Variance Analysis

  • Denial Analytics

  • Pricing Optimization

  • Population Health Risk Modeling

We thrive on the numerous challenges in the Healthcare space. It is always rewarding to wrestle and conform disparate data to create actionable insights and meaningful outcomes. This is still at the core of everything we do. That said, there is no bigger data challenge than MA Stars.

Looking to the Future

CMS is continuing to advance the Star ratings program. Achieving above-average Star ratings will be progressively more difficult in the upcoming years. In the measurement year 2024, CMS will:

  • Introduce new quality measures (Improving Physical & Mental Health).

  • Change the weighting factors on existing measures.

  • Being using online surveys for participants.

  • Replace Reward Factors with a Health Equity Index (HEI).

I think we are up for the challenge. We will strive to advance our platform and make every effort to help providers and plan administrators achieve and maintain high star ratings. I look forward to more blog posts to share our experiences along the way.

Free Finance Data Sets for the Quants

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2023/11/25/Free-Finance-Datasets-for-Quants.html

Now and then I am asked how to get started in quant finance and
my advice has always been to just get hold of some data and play about
with different models. The first step is to get some data and this post takes you
through several different sources and hopefully gives you the
launchpad to start poking around with financial data.


Enjoy these types of posts? Then you should sign up for my newsletter.


I’ve tried to cover different assets and frequencies to hopefully
inspire the various types of quant finance
out there.

High-Frequency FX Market Data

My day-to-day job is in FX so naturally, that’s where I think all the
best data can be found. TrueFX provides
tick-by-tick in milliseconds, so high-frequency data is available for free and across lots of different currencies.
So if you are interested in working out how to deal with large amounts
of data (1 month of EURUSD is 600MB) efficiently, this source is a
good place to start.

As a demo, I’ve downloaded the USDJPY October dataset.

using CSV, DataFrames, DataFramesMeta, Dates, Statistics
using Plots

It’s a big CSV file, so this isn’t the best way to store the data,
instead, stick it into a database like QuestDB
that are made for time series data.

usdjpy = CSV.read("USDJPY-2023-10.csv", DataFrame,
                 header = ["Ccy", "Time", "Bid", "Ask"])
usdjpy.Time = DateTime.(usdjpy.Time, dateformat"yyyymmdd HH:MM:SS.sss")
first(usdjpy, 4)
4×4 DataFrame
Row Ccy Time Bid Ask
String7 DateTime Float64 Float64
1 USD/JPY 2023-10-01T21:04:56.931 149.298 149.612
2 USD/JPY 2023-10-01T21:04:56.962 149.298 149.782
3 USD/JPY 2023-10-01T21:04:57.040 149.589 149.782
4 USD/JPY 2023-10-01T21:04:58.201 149.608 149.782

It’s simple data, just a bid and ask price with a time stamp.

usdjpy = @transform(usdjpy, :Spread = :Ask .- :Bid, 
                            :Mid = 0.5*(:Ask .+ :Bid), 
                            :Hour = round.(:Time, Minute(10)))

usdjpyHourly = @combine(groupby(usdjpy, :Hour), :open = first(:Mid), :close = last(:Mid), :avg_spread = mean(:Spread))
usdjpyHourly.Time = Time.(usdjpyHourly.Hour)

plot(usdjpyHourly.Hour, usdjpyHourly.open, lw =1, label = :none, title = "USDJPY Price Over October")

Looking at the hourly price over the month gives you flat periods
over the weekend.

USDJPY October price chart

Let’s look at the average spread (ask – bid) throughout the day.

hourlyAvgSpread = sort(@combine(groupby(usdjpyHourly, :Time), :avg_spread = mean(:avg_spread)), :Time)

plot(hourlyAvgSpread.Time, hourlyAvgSpread.avg_spread, lw =2, title = "USDJPY Intraday Spread", label = :none)

USDJPY average intraday spread

We see a big spike at 10 pm because of the day roll and the
secondary markets go offline briefly, which pollutes the data
bit. Looking at just midnight to 8 pm gives a more indicative picture.

plot(hourlyAvgSpread[hourlyAvgSpread.Time .<= Time("20:00:00"), :].Time, 
     hourlyAvgSpread[hourlyAvgSpread.Time .<= Time("20:00:00"), :].avg_spread, label = :none, lw=2,
     title = "USDJPY Intraday Spread")

USDJPY average intraday spread zoomed

In October spreads have generally been wider in the later part of the
day compared to the morning.

There is much more that can be done with this data across the
different currencies though. For example:

  1. How stable are correlations across currencies at different time frequencies?
  2. Can you replicate my microstructure noise post? How does the microstructure noise change between currencies
  3. Price updates are irregular, what are some statistical properties?

Daily Futures Market Data

Let’s zoom out a little bit now, decrease the frequency, and widen the
asset pool. Futures cover many asset classes, oil, coal, currencies,
metals, agriculture, stocks, bonds, interest rates, and probably
something else I’ve missed. This data is daily and roll adjusted, so
you have a continuous time series of an asset for many years. This means you can look at the classic momentum/mean reversion portfolio models and have a real stab at long-term trends.

The data is part of the Nasdaq data link product (formerly Quandl)
and once you sign up for an account you have access to the free
data. This futures dataset is
Wiki Continuous Futures
and after about 50 clicks and logging in, re-logging in, 2FA codes
you can view the pages.

To get the data you can go through one of the API packages in
your favourite language. In Julia, this means the QuandlAccess.jl
package which keeps things simple.

using QuandlAccess

futuresMeta = CSV.read("continuous.csv", DataFrame)
futuresCodes = futuresMeta[!, "Quandl Code"] .* "1"

quandl = Quandl("QUANDL_KEY")

function get_data(code)
    futuresData = quandl(TimeSeries(code))
    futuresData.Code .= code
    futuresData
end
futureData = get_data.(rand(futuresCodes, 4));

We have an array of all the available contracts futuresCodes and
sample 4 of them randomly to see what the data looks like.

p = []
for df in futureData
    append!(p, plot(df.Date, df.Settle, label = df.Code[1]))
end

plot(plot.(p)..., layout = 4)

Futures examples

  • ABY – WTI Brent Bullet – Spread between two oil futures on different
    exchanges.
  • TZ6 – Transco Zone 6 Non-N.Y. Natural Gas (Platts IFERC) Basis – Spread between
    two different natural gas contracts
  • PG – PG&E Citygate Natural Gas (Platts IFERC) Basis – Again, spread between
    two different natural gas contracts
  • FMJP – MSCI Japan Index – Index containing Japanese stocks

I’ve managed to randomly select 3 energy futures and one stock
index.

Project ideas with this data:

  1. Cross-asset momentum and mean reversion.
  2. Cross-asset correlations, does the price of oil drive some equity indexes?
  3. Macro regimes, can you pick out commonalities of market factors over
    the years?

Equity Order Book Data

Out there in the wild is the FI2010 dataset which is essentially a
sample of the full order book for five different
stocks on the Nordic stock exchange for 10 days. You have 10 levels of
prices and volumes and so can reconstruct the order book throughout the
day. It is the benchmark dataset for limit order book prediction and you will see it referenced
in papers that are trying to implement new prediction models. For
example Benchmark Dataset for Mid-Price Forecasting of Limit
Order Book Data with Machine Learning Methods

references some basic methods on the dataset and how they perform when
predicting the mid-price.

I found the dataset (as a Python package) here
https://github.com/simaki/fi2010 but it’s just stored as a CSV which
you can lift easily.

fi2010 = CSV.read(download("https://raw.githubusercontent.com/simaki/fi2010/main/data/data.csv"),DataFrame);

Update on 7/01/2024

Since posting this the above link has gone offline and the user has
deleted their Github account! Instead the data set can be found here:
https://etsin.fairdata.fi/dataset/73eb48d7-4dbc-4a10-a52a-da745b47a649/data
. I’ve not verified if its in the same format, so there might be some
additional work going from the raw data to how this blog post sets it
up. Thank’s to the commentators below pointing this out.

The data is wide (each column is a depth level of the price and
volume) so I turn each into a long data set and add the level, side
and variable as a new column.

fi2010Long = stack(fi2010, 4:48, [:Column1, :STOCK, :DAY])
fi2010Long = @transform(fi2010Long, :a = collect.(eachsplit.(:variable, "_")))
fi2010Long = @transform(fi2010Long, :var = first.(:a), :level = last.(:a), :side = map(x->x[2], :a))
fi2010Long = @transform(groupby(fi2010Long, [:STOCK, :DAY]), :Time = collect(1:length(:Column1)))
first(fi2010Long, 4)

The ‘book depth’ is the sum of the liquidity available at all the
levels and indicates how easy it is to trade the stock. As a
quick example, we can take the average of each stock per day and use
that as a proxy for the ease of trading these stocks.

intraDayDepth = @combine(groupby(fi2010Long, [:STOCK, :DAY, :var]), :avgDepth = mean(:value))
intraDayDepth = @subset(intraDayDepth, :var .== "VOLUME");
plot(intraDayDepth.DAY, intraDayDepth.avgDepth, group=intraDayDepth.STOCK, 
     marker = :circle, title = "Avg Daily Book Depth - FI2010")

FI2010 Intraday book depth

Stock 3 and 4 have the highest average depth, so most likely the
easier to trade, whereas Stock 1 has the thinnest depth. Stock 2 has
an interesting switch between liquid and not liquid.

So if you want to look beyond top-of-book data, this dataset provides
the extra level information needed and is closer to what a
professional shop is using. Better than trying to predict daily Yahoo
finance mid-prices with neural nets at least.

Build Your Own Crypto Datasets

If you want to take a further step back then being able to build the
tools that take in streaming data directly from the exchanges and
save that into a database is another way you can build out your
technical capabilities. This means you have full control over what you
download and save. Do you want just the top of book every update, the
full depth of the book, or just the reported trades?
I’ve written about this before, Getting Started with High Frequency Finance using Crypto Data and Julia, and learned a lot in the
process. Doing things this way means you have full control over the entire
process and can fully understand the data you are saving and any
additional quirks around the process.

Conclusion

Plenty to get stuck into and learn from. Being able to get the data
and loading it into an environment is always the first challenge and
learning how to do that with all these different types of data should
help you understand what these types of jobs entail.

Boost your time to market with Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/11/24/pe104.html

Introduction

Today my post was inspired by an interesting write-up by Christopher Rackauckas:
ChatGPT performs better on Julia than Python (and R) for Large Language Model (LLM) Code Generation. Why?.
The general take away of this text is that writing Julia code with LLM support is efficient.

A thing that I think is relevant here is that when you write code you typically want to get your job done as fast as possible.
Getting the job done typically involves three steps (possibly repeated):

  • Thinking about the algorithm.
  • Implementing it.
  • Running the code.

Each of these steps takes time. And you, usually, want to get a solution in a minimum possible total time.

My experience is that Julia is really fast to code with once you learn it (and as Chris explained using LLMs
boosts this time even further).

Since Julia is compiled it also offers fast code execution times.

So we are left with algorithm design time. And this is the point that lead me to write this post.
The reason is that my experience is that I often decide to use non-optimal algorithms, because
with Julia I know my code will be reasonably fast anyway. This is especially the case if I can
see a simple solution that uses basic functionalities in-built into Julia and do not require too much thought.

To test-drive this assertion today I chose a Project Euler problem that I have not solved before and decided
to write down my thought process when solving it. My choice was problem 104, because it was a first relatively easy
(so I assumed I would not need to spend too much time on it) problem I have not solved yet.

The code was written under Julia 1.9.2.

The problem

The Project Euler problem 104 is stated as follows (abbreviated):

Find the number of the smallest Fibonacci number for which
its first nine digits contain all the digits 1 to 9 (not necessarily in order)
and the same condition holds for its last nine digits.

Thinking about the algorithm

The observations I made are the following:

  • I assumed that the problem is not trivial – I need to do some optimizations to solve it in a reasonable time.
  • The last nine digits are easier; I can just track values modulo 10^9 and easily fit the data into Int type.
  • The number formed by last nine digits must be divisible by 9 as the sum of numbers from 1 to 9 is 45.
  • The first nine digits are harder. I could get them by using the Binet’s Formula, but this would require analysis of round-off error of floating-point operations.
    This would take thinking time, so maybe it is enough to just use the BigInt values and check them only if the Int based test passes.
  • The good thing is tha Julia ships with the digits function so I can easily get a vector of digits of a given number.
  • I need to combine Int and BigInt calculations to cut down the processing time. Most of the time it should be enough to analyze just Int values.

As you can see, I decided not to invest too much into the thinking time,
hoping that the implementation of the above algorithm will be easy and its run time will be acceptable.

Implementing it

The following code contains the implementation of the algorithm following the observations I have described above:

function pe104()
    big1, big2 = big(1), big(1)
    small1, small2 = 1, 1
    k = 2
    while true
        k += 1
        big1, big2 = big1 + big2, big1
        small1, small2 = (small1 + small2) % 10^9, small1

        small1 % 9 == 0 &&
        sort!(digits(small1)) == 1:9 &&
        sort!(last(digits(big1), 9)) == 1:9 &&
        return k
    end
end

Note that we perform computations both on Int values (small1 and small2) and on BigInt values (big1 and big2).
In the code we do three tests to decide if a given k is good. The tests are ordered in the increasing order of computational cost.
Note that since the Int values are computed modulo 10^9 I do not need to do any trimming of vector of digits returned by digits(small1).

Running the code

Let us check how much time it takes to find the solution:

julia> @time pe104();
 16.634432 seconds (696.82 k allocations: 4.441 GiB, 3.01% gc time)

Note that I put ; at the end of the line to suppress printing of the solution (in hope to encourage you to run the code yourself).
The point is that the run time of the code is just a few seconds. The code is clearly not optimal and could be significantly faster.
However, I am sure that speeding it up would take me more time than I would save in run time. Thus, I am happy with what I got.

Conclusions

In general Julia allows you to write efficient code. There are many applications where it is crucial and it is worth to spend a lot of time
on optimization of run-time. This is especially true if you write your code once and run it millions of times.

However, in many cases, like the one presented today, Julia is fast enough so even code that is not fully performant is good enough.
Often such simpler code takes: less time to design, less time to implement, and less time to prove its correctness.