Semantic Versioning (Semver) is flawed, and Downgrade CI is required to fix it

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/semantic-versioning-semver-is-flawed-and-downgrade-ci-is-required-to-fix-it/

Semantic versioning is great. If you don’t know what it is, it’s just a versioning scheme for software that goes MAJOR.MINOR.PATCH, where

  1. MAJOR version when you make incompatible API changes
  2. MINOR version when you add functionality in a backward compatible manner
  3. PATCH version when you make backward compatible bug fixes

That’s all it is, but it’s a pretty good system. If you see someone has updated their package from v3.2.0 to v3.2.1, then you know that you can just take that update because it’s just a patch, it won’t break your code. You can easily accept patch updates. Meanwhile, if you see they released v3.3.0, then you know that some new features were added, but it’s safe for you to update. This allows you to be compatible with v3.3.0 so that if a different package requires it, great you can both use it! Thus a lot of version updates to your dependencies can be accepted without even thinking about it. However, when you see that v4.0.0, you know that your dependency broke some APIs, so you need to do that compatibility bump automatically. Thus the semvar system makes it much easier to maintain large organizations of packages since the number of manual version bumps that you need to do are rather small.

Because of how useful this can be, many package managers have incorporated a form of semantic versioning into its system. Julia’s package manager, Rust’s package manager, Node’s package manager, and more all have ways that integrate semantic versioning into its systems, making it easy to automatically accept dependency updates and thus keeper a wider set of compatibility than can effectively done manually. It’s a vital part of our current dependency system.

Okay if it’s great, then how can it be flawed?

Semver is flawed for two reasons:

  1. The definition of “breaking” is vague and ill-defined at its edges
  2. Current tooling does not accurately check for Semver compatibility

Breaking: a great concept but with unclear boundaries

The first point is somewhat known and is best characterized by the classic XKCD comic:

Any change can break code. It’s really up to the definition of “what is breaking”. There’s many nuances:

  • “Breaking” only applies to the “public facing API”, i.e. things that users interact with. If anything changing was considered breaking then every change would be a breaking change, so in order for semver to work you have to have some sense of what is considered internals and what is considered public. Julia in its next recent version has a new “public” keyword to declare certain internals as public, i.e. things which are exported and specifically chosen values in a package module are considered internal by default. If you have many users, you will still find someone say “but I use function __xxxxyyyz_internal because the API doesn’t allow me to pass mycacheunsafemathbeware optimally”, but at least you can blame them for it. This is the most solvable issue of semver and simply requires due diligence and sticking to a clear system for what’s exposed and what’s not. That’s a bit harder in dynamic languages, but as shown there are systems in place for this.
  • What is considered “breaking” in terms of functionality can have some fuzzy edges. I work on numerical solvers and connections to machine learning (scientific machine learning or SciML). If someone calls the ODE solver with abstol=1e-6 and reltol=1e-3, then what is returned is an approximation to the ODE’s solution with a few digits of accuracy (there’s some details in here I will ignore). If a change is made internally to the package, say more SIMD for better performance, which causes the result to change in the 12 digit, is that breaking? Because the ODE solver only is guaranteeing at most 3-6 digits of accuracy, probably not. But what if the 6th digit changes? The 5th? If the built-in sin function in the language changes in the 15th digit of accuracy, is that breaking? Most documentation strings don’t explicitly say “this is computed to 1ulp (units in the last place)”, so it’s not always clear what one is truly guaranteed from a numerical function. If someone improves the performance of a random number generator and now the random numbers for the same seed are different, is this breaking? Were you guaranteed that wasn’t going to change? People will argue all day about some of these edge cases, “it broke my tests because I set the random number seed and now it broke”. Look at any documentation, for example numpy.random.rand, and it won’t clarify these details on you can rely on to change and not change. This granularity is a discussion with a vague boundary.
  • One man’s bug fix is another man’s breaking change. You may have intended for all instantiations of f(x::T) (or T.f(x)) to return an integer, but one of them returned a float. So what do you do? You go fix it, make them all return floats, and add documentation on the interface that they all return a floating point value and implement some interface to enforce it across all functions… and then the issues roll in “you broke my code because I required that this version of the function returned an integer!”. A bugfix is by definition correcting an unintended behavior. However, someone has to define “unintended”, and your users may not be able to read your brain and may consider what was “intended” differently. I’m not sure there really is a solution to this because a bug is by definition unintended: if you knew it was there then you would have either fixed it or documented it earlier. But left with no documentation on what to do, the user may thing the behavior is intentional and use it.
  • Adding new functionality may have unintended consequences. You may have previously threw an error in a given case, but now return an approximation. The user may only want exact solutions to some math function f(x), so they relied on the error throw before in order to know if the solution would have been exactly calculable. Your new approximation functionality that you just released with a nice blog post thus just broke somebody’s code. So is it a major update, or a minor update? You never “intended” for only giving exact solutions, the error message might’ve even said “we intend to add this case in the near future with an approximation”, but you still broke their code.

As Churchill said, “democracy is the worst form of government, except for all the others”. In this case, semver is great because it conveys useful information, but we shouldn’t get ahead of ourselves and thus assume it does everything perfectly. Its definitions can be vague and it requires discussion to figure out whether something is breaking or a patch sometimes.

But if it does fail, hopefully our tooling can help us know. We all have continuous integration and continuous deployment (CI/CD), that helps us handle semver… right?

Standard CI/CD Systems are Insufficient to Check Semver Compatibility

I’m no chump so I set my versioning to use semantic versioning. My Project.tomls are all setup to put lower bounds, for example I list out all of my version requirements like a champ (if you’re not familiar with Julia’s package manager, everything defaults to semver and thus DiffEqBase = “6.41” in the compat implicitly means any 6.x with x>41, but 7 is not allowed). We laugh in the face of the Python PyPI system because our package registration system rejects any package (new or new version) which does not have an upper bound. Every package is required to have compatibilities specified, and thus random breakage is greatly reduced. We have forced all package authors to “do the right thing” and users have ultimately one. Package it up, we’re done here.

But then… users… see some breakage? They make a post where they show you that user your package failed. How could that happen? Well it goes back to part one that there are some edges in semantic versioning that may have creeped in somewhere. But many times what has happened is that the authors have simply forgotten what their lower bound means. v3.3.0 introduced the function f(x) in PkgA so when you started to use that function from the dependency, you set the lower bound there and life is good. g(x) was introduced in v3.4.0 and a few years later PkgA is at 3.11.2 you learn about it and go “cool PkgA is great!”, you start using g(x), your CI system says everything is fine, and then a user pops up and says your package is broken for them. When digging into the logs, you see that there’s some other package that only allows The Missing Link: Downgrade CI

The real core issue here is that semantic versioning is generally inadequately tested. In theory, if I put a lower bound saying I accept v3.3.0 and anything above it until v4, then I might be saying I am allowing more than 100 versions of PkgA. If I also have a PkgB with similar semantic versioning, I could be allowing 100 variations of that as well. However, the way everyone’s CI/CD system runs is to take the latest version of the packages. Okay, maybe for some major dependency like the standard library you list ‘Programming Language :: Python :: 3.8’, ‘Programming Language :: Python :: 3.9’, … to test multiple versions of that, but are you checking the 100,000 permutations of all allowed dependency versions? Almost certainly not. Not only have I not seen this, it’s also just infeasible to do in practice.

But as a shortcut, what you should be doing is at least checking the most edgy of edge cases. If I state v3.3.0 is my allowed lower bound, most CI systems will simply grab the latest v3.y.z with y and z as big as possible. However, I should at least have one run with v3.3.0 to see if it’s still sensible. This would have caught that g(x) was not defined in v3.4.0. While this wouldn’t fix all issues with semantic versioning, it can at least identify many of them pretty straightforwardly.

We call this scheme “Downgrade CI”, i.e. downgrade all dependencies to their minimum versions and run it. Most users will only ever see the maximum versions so sure it doesn’t matter to most people, but as people add more and more into their environment they will start to see earlier versions, and it’s these minimum versions that are the true key to whether your package will give a sensible environment, not the version maximums which semver puts so much effort into!

Setting up Downgrade CI

Okay, so hopefully I’ve convinced you that semver is not a magical solution to all compatibility problems, it’s a nice tool but not a silver bullet, and you need to have some form of downgrade CI. How do you actually accomplish this? Thankfully the Julia ecosystem has a julia-downgrade-compat-action which sets up Github Actions CI to automatically run the package versions with this downgrade idea in mind. If you’re scared of trying to figure that out, don’t worry and just copy-paste a script out of SciML. For example, from SciMLBase.jl:

name: Downgrade
on:
  pull_request:
    branches:
      - master
    paths-ignore:
      - 'docs/**'
  push:
    branches:
      - master
    paths-ignore:
      - 'docs/**'
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        version: ['1']
    steps:
      - uses: actions/checkout@v4
      - uses: julia-actions/setup-julia@v1
        with:
          version: ${{ matrix.version }}
      - uses: cjdoris/julia-downgrade-compat-action@v1
        with:
          skip: Pkg,TOML
      - uses: julia-actions/julia-buildpkg@v1
      - uses: julia-actions/julia-runtest@v1

This will add a new set of CI tests which run in the downgraded form and ensure your lower bounds are up to date. Will this solve all version compatibility issues? No, but hopefully this catches most of the major classes of issues.

Conclusion

In conclusion, use downgrade CI because semver isn’t perfect and while it does give a decent idea as to handling of upper bounds, lower bounds still need to be handled quite manually and “manual” is synonym for “can break”.

The post Semantic Versioning (Semver) is flawed, and Downgrade CI is required to fix it appeared first on Stochastic Lifestyle.

One puzzle, two solutions

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/02/09/pe116.html

Introduction

Today, I wanted to switch back to a lighter subject.
Therefore I decided to have a look at my favorite Project Euler website.

I picked the problem 116 as I have not tried to solve it yet.
Interestingly, it turned out that there are two ways to approach this puzzle,
so I thought to share them here.

The post was written under Julia 1.10.0.

The puzzle

The Project Euler puzzle 116 can be briefly stated as follows:

Given a row of 50 grey squares is to have a number of its tiles replaced with
coloured oblong tiles chosen from red (length two),
green (length three), or blue (length four).
How many different ways can the grey tiles be replaced if colours
cannot be mixed and at least one coloured tile must be used?

(If you want to see some visual examples of valid tilings, I encourage you to
visit the puzzle 116 page.)

The first approach

When we think of this problem, it is natural to generalize it. By C(n, d) we can
define the number of ways that n gray squares can be replaced with tiles of length d.
Then the solution to our problem is C(n, 2) + C(n, 3) + C(n, 4).
So let us focus on computing C(n, d) (assuming d is positive).

The first approach is to ask how many tiles of length d can be put. There must be at least
1, and we cannot put more than n ÷ d (here I use the ÷ notation taken from Julia that
denotes integer division; in other words the integer part of n / d).

So now assume that we want to put i blocks of length d (assuming i is valid). In
how many ways can we do it. Well, we put i long blocks and we are left with n - d*i gray blocks.
In total we have i + (n - d*i) blocks. You can then think of it as you have that many slots
from which you need pick i slots to put the long blocks. The number of ways you can do it is
given by the value of binomial coefficient. In Julia notation it is:
binomial(BigInt(i + (n - d*i)), BigInt(i)).

Now you might ask why I put the BigInt wrapper around the passed numbers? The reason is
that binomial coefficient gets large pretty quickly, so I want to make sure I will not
have issues with integer overflow.

Given these considerations the first function that produces C(n, d) can be defined as:

function C1(n::Integer, d::Integer)
    @assert d > 0 && n >= 0
    return sum(i -> binomial(BigInt(i + (n - d*i)), BigInt(i)), 1:n ÷ d; init=big"0")
end

Note that I use the init=big"0" initialization statement in the sum to ensure the
correct handling of the cases when n < d when we are given an empty collection to sum over.

The second approach

However, there is a different way how we can think of computing C(n, d).

Assume we know the values of C(n, d) for values of n smaller than the requested one.

We look at the last tile in our row.

If it is empty, then we are down to n-1 tiles to be filled.
This can be done in C(n-1, d) ways (remember that this value takes care
of the fact that at least one block of length d has to be used).

But what if the last tile in our row is filled with a block of length d?
Then we have two options. Either all other blocks are left gray (which gives us 1 combination)
or we are left with n-d tiles that are filled with at least one block of length d. The
second value is exactly C(n-d, d).

In summary we get that C(n, d) = C(n-1, d) + C(n-d, d) + 1.

This formula assumes n is at least d. But clearly for n < d
we have 0 ways to arrange the blocks.

Let us write down the code that performs the required computation:

function C2(n::Integer, d::Integer)
    @assert d > 0 && n >= 0
    npos = Dict{Int,BigInt}(i => 0 for i in 0:d-1)
    for j in d:n
        npos[j] = npos[j-1] + npos[j-d] + 1
    end
    return npos[n]
end

Note in the code that I used the npos dictionary to flexibly allow
for any potential integer values of n. The dictionary has
Dict{Int,BigInt} type, again, to ensure that the results of the computations
are stored correctly even if they are large.

Testing

Now we have two functions C1 and C2 that look completely differently.
Do they produce the same results. Let us check:

julia> using Test

julia> @testset "test C1 and C2 equality" begin
           for n in 0:200, d in 1:20
               @test C1(n, d) == C2(n, d)
           end
       end;
Test Summary:           | Pass  Total  Time
test C1 and C2 equality | 4020   4020  0.9s

Indeed we see that both C1 and C2 functions produce the same results.

To convince ourselves that using arbitrary precision integers was indeed needed
let us check some example values of the functions:

julia> C1(200, 2)
453973694165307953197296969697410619233825

julia> C2(200, 2)
453973694165307953197296969697410619233825

julia> typemax(Int)
9223372036854775807

Indeed, if we were not careful, we would have an integer overflow issue.

Conclusions

As usual I will not show the value of the solution to the problem to encourage you
to run the code yourself. You can get it by executing either
sum(d -> C1(50, d), 2:4) or sum(d -> C2(50, d), 2:4).
(We have just checked that the value produced in both cases is the same).

My First Julia Package – TriangulArt.jl

By: Jonathan Carroll

Re-posted from: https://jcarroll.com.au/2024/02/04/my-first-julia-package-triangulart-jl/

I’ve tried to get this same image transformation working at least three times now, but
I can finally celebrate that it’s working! I’ve been (re-)learning Julia and I still
love the language, so it was time to take my learning to the next level and actually
build a package.

I’ve tried to get this same image transformation working at least three times now, but
I can finally celebrate that it’s working! I’ve been (re-)learning Julia and I still
love the language, so it was time to take my learning to the next level and actually
build a package.

For those not familiar, Julia is a much newer language than my daily-driver R, and with
that comes the freedom to take a lot of good features from other languages and implement
them. There are some features that R just won’t ever get, but they’re available in Julia and
they’re very nice to use.

I’ve written solutions to the first 20
or so Project Euler problems in Julia … wow, 5 years ago.

More recently, I have solved the first 18 days of Advent of Code 2023 in Julia
(my solutions
are in a fork of a package that I’m not using, so they more or less run independently).

With those under my belt, I revisited a project I’ve tried to implement several times. I like
the low-poly look and wanted to recreate it – it’s just an image transformation,
right? I’m even somewhat familiar with Delaunay Triangulation, or at least its dual the
Voronoi Tesselation from my days building spatial maps
of fishing areas.

It sounds like a simple enough problem; choose some points all over the image, triangulate between
all of them, then shade the resulting triangles with the average colour of the pixels they enclose.

I found this nice image of a rainbow lorikeet (these frequent my backyard)

Rainbow Lorikeet

so I got to work trying to chop it up into triangles.

Well, the naive approach is simple enough, but it produces some terrible results. I’ve built that version into what I did eventually get working, and it’s… not what I want

It works, but at what cost?

The problem is that by randomly selecting points across the image, you lose all the structure. With
enough triangles you might recover some of that, but then you have a lot of triangles and lose that
low-poly vibe.

After much searching for a better way to do this, I found this article from 2017. It’s
a python approach, but I figured I knew enough Julia and Python now that I could try to make
a 1:1 translation.

The first step is to get the random sampling working, because it allows me to start testing
the triangulation parts quickly. Generating those is pretty clean

function generate_uniform_random_points(image::Matrix{RGB{N0f8}}, n_points::Integer=100)
    ymax, xmax = size(image)[1:2]
    rand(n_points, 2) .* [xmax ymax]
end

The triangulation itself is handled by DelaunayTriangulation::triangulate()
for once I’m happy that there’s so much scientific/statistical support in Julia

rng = StableRNG(2)
tri = triangulate(all_points; rng)

Slightly trickier is figuring out which points are in which triangle. For that, I am
thankful for PolygonOps::inpolygon(). With the pixels for each triangle identified,
it was only a matter of averaging the R, G, and B channels to get the median colour.

I got that working, but with the results above – far from pleasant. The next, much harder step,
was to weight the points towards the ’edges’ of the image. I couldn’t find an easy
way to translate the python code for locally sampling the entropy (via skimage)

filters.rank.entropy(im2, morphology.disk(entropy_width))

so I tried to build something of my own. I tried edge-detection algorithms but
I was clearly doing something wrong with it. Partly, I suspect, not doing the down-weighting
that the python version includes.

Since the pixels we want to up-weight are all along lines, choosing these at random can
end up with several right next to each other, which we don’t want. The python version does
something a little clever – it selects one point, then reduces the weighting of the entire image
with a Gaussian around that point, so that nearby points are unlikely to also be selected.

In the end, I failed to find a good Julia alternative, but calling python code is (almost) as
simple as using PyCall; @pyimport skimage as skimage (with slight modifications to use in a
package, as I would later discover).

With that in place, I was able to successfully weight towards high-entropy regions; regions
where a larger number of bytes are required to encode a histogram of the grayscale pixels, i.e.
where there’s a lot going on. The results are much more pleasing

Entropy-weighted Delaunay Triangulation

Along the way I added some debug features, such as plotting the vertices and edges of the
triangulation on top of the image

Debug mode

With the workflow more or less working, I ran some profiling to see if I could
speed it up. Unsurprisingly, generating the weighted points was one area where a
lot of time was spent, though it’s not yet clear if that’s because it’s python
code or because that’s genuinely one of the most complex parts of the code – my
best Julia alternative was to write my own short Shannon entropy function and
make it search locally with ImageFiltering::mapwindow

function shannon(w::AbstractMatrix)
   cm = collect(values(countmap(w))) / sum(size(w)) / 256
   sum([-x * log2(x) for x in cm])
end

mapwindow(shannon, image, (19, 19))

though, this creates a square subsampling, whereas the python version uses a nicer disk.

The profiling shows a lot of blank areas, and I’m not sure how to interpret those

Profile&rsquo;s @profview output in VS Code

I realised at this point that I actually didn’t know how long the python version takes
to run. I grabbed the original source code
and tried running it (after installing the relevant python packages) but it failed –
some packages had changed their arguments and signatures since this was written. A couple
of small updates later, my fork
now runs the code. It doesn’t take terribly long to run – it doesn’t display the image,
it saves it, and I’m not sure if that’s a factor. I (naively?) expected that the Julia
version would be a lot faster, and I’m hopeful that there’s performance I’ve left on the
table.

If anyone is interested in playing with a small-ish Julia package, feel free to poke at
this – it’s definitely not being used for anything critical.

For now, I’m enjoying throwing images at this and getting some nice looking results

If you’re interested in having a play with this package or helping to improve it,
it’s on GitHub – I’m not planning
to publish it to the registry any time soon, but that’s perhaps something to look
forward to in the future. For now, the main issues I see with this package are:

  • The white border around the produced image remains – I have tried setting
    margin=0mm but that doesn’t appear to help

  • Performance is not as good as it can be, I suspect; the entropy calculation
    (calling python) is definitely a bottleneck.

  • To speed up the processing, only every 10th pixel is used to determine the
    average colour of the triangle – this may fail to identify an entire triangle.

  • CI – I generated this package in VSCode using PkgTemplates and it is the
    first Julia package I’ve built. CI failed immediately, so I’ve probably done
    something wrong.

  • I am still somewhat of a beginner in Julia, so there are probably many places
    in which improvements can be made – feel free to suggest them!

As always, I can be found on Mastodon and
the comment section below.

devtools::session_info()

“`{r sessionInfo, echo = FALSE}
devtools::session_info()
“`